Spam Hammer

October 14, 2005 | Posted by Aubrey Turner

I think I’m starting to get a little bit of a handle on referrer spam, although I’ve had to be pretty ruthless about what gets filtered. But since my “referrer” page is not published anymore, I consider anyone trying to hit it as a spammer. It’s not perfect, but it’s better, and my CPU usage is now down to acceptable levels. There were 21148 requests for the referrer page, of which all but 2057 were rejected. The problem is that these bastards keep buying new domain names to replace the ones that are blocked.

But along the way I’ve discovered that they’re also hitting my trackback script, to the tune of 1987 hits yesterday. This is a troubling, as it appears to have increased since I’ve begun blocking referrers. Unfortunately, these hits contribute to server load because EE has to validate the “token” (I use randomized trackback URLs) and then filter the content. None of the attempts from yesterday were successful, though, due to the filtering. The problem with these is that there is nothing in the access.log to use to filter on. The request is an HTTP POST, and consequently we can’t see what they were trying to pass. So for now I’m blocking the worst offenders by IP. It’s not likely that any legitimate user will attempt to post more than 10 trackbacks from the same IP in one day.

The following bit of UNIX command-line hackery is what I use to determine the offenders. It reports the IP of each system that has submitted 10 or more trackback requests during the previous day.
grep trackback access.log.2005-10-13 | grep -v 403 | grep -v 503 | awk ‘’ | sort | uniq -c | awk ‘{ if (strtonum($1)>=10) print $1,$2; }’

Here’s an example of the output:
20 212.142.33.108 11 216.56.240.71 56 217.219.39.3 108 219.144.196.226 12 219.93.174.101 21 219.93.174.102 12 219.93.174.105 13 219.93.174.109 26 63.144.59.210 59 63.144.59.211 14 64.89.16.7 10 67.50.44.156 10 82.110.130.58

Finding and printing the referrer spammers who leaked through the filters is a little more challenging, since some of them use a full HTML <a> tag in their referrer and some don’t. I suspect that there is some handy-dandy regular expression that would make this simpler, but I’m not a regex guru. It’s also interesting that some of them (for some reason) are using my own domain in the referrer. I suspect this is a simplistic attempt to get me to blacklist myself, but I’m not sure. Given all that, here’s an example of what I use to identify the worst referrer offenders for the previous day.
grep referrer access.log.2005-10-13 | grep -v 403 | grep -v 503 | grep -v aubreyturner | awk ‘{ if ($11=="\"<a"){ $t=substr($12,6); print substr($t,0,index($t,">")-1)} else print substr($11,2,length($11)-2);}’ | sort | uniq -c | awk ‘{ if(strtonum($1)>=10) print $1,$2; }’

And an example of the output:
215 - 88 http://agrino.org/uichsa/wwwboard/567.html 86 http://agrino.org/uichsa/wwwboard/568.html 86 http://agrino.org/uichsa/wwwboard/569.html 85 http://agrino.org/uichsa/wwwboard/570.html 84 http://agrino.org/uichsa/wwwboard/644.html 48 http://generic-######.splinder.com 204 http://#############.50webs.com 32 http://tinman.cs.gsu.edu/~cscjghx/csc3360/wwwboard/messages/86.html 32 http://www.horrorseek.com/horror/dreadful/wwwboard/34.html

As you can see, there are a lot of ones with blank or “-” for the referrer. Those are particularly troublesome in that they’re hard to block (except by IP, but that’s a losing game). I’m not sure what they intend to gain from hitting the referrer URL without any referrer. All it ends up doing is sending them a nearly-blank page (about 100 bytes of almost static content).

One of these days I guess I’ll glue the above commands together into a nightly job that sends me a report in email. Unless these idiots magically disappear before I get tired of doing this manually…

(Updated to try out word censoring for ###### and a couple of other words…)

Category Categories: Random Ramblings

Comments are closed.

aubreyturner.org

General rumblings and grumblings…

Meta

Search

Categories

Archives

Spam Hammer

« She Told Me To Walk This Way…

Change Can Be Hairy »

Blogroll

Civil Rights

Geekery

Keller

Technical

WordPress Info