Table of contents
The problem: comment spam
Comment spam is a phenomenon that unfortunately increased during the last months. Here is a description of what it is about:
Another comprehensive description:
The purpose behind these spam comments isn't to be mean to the weblogger -- there is nothing personal in these attacks. The purpose also isn't to con weblog post visitors into clicking through to the site -- our traffic isn't that heavy.
(No, not even the top sites (10,000 unique visitors a day isn't even considered a medium traffic site outside of weblogging circles.))
The Lolita comment spammer's only purpose is to get links into weblog posts for that infamous and most cherished little web crawler -- the Google bot. By doing so, the URL that gets put into the weblog post achieves a higher Google PageRank based on the active links in our weblogs, and consequently, the URL is going to show towards the front of search results when searching on a particular keyword. Such as porn. Such as viagra.
Blogs are by nature open for comments from everyone. Dropping a quick note without the need of registering somehow is vital to many blogs and commonly regarded as one of the most important factors that make blogs interesting for visitors.
There are more and more concerns over Comment spams on weblogs. So far, most spams are means to improve the Google pagerank of spammers' websites by putting their URL's in the commenter's website field. Weblogs are heavily-linked webpages, and the pagerank algorithm is heavily based on links pointing in and out of a webpage, it won't be difficult to see that, in future, blog comment spam will increase especially weblogging is getting more and more trendy.
The open nature of blogs seem to attract spamvertisers, and as a result currently there are two (three) types of spam comments:
- The decent spam comments:
Forms for posting a comment to a blog usually consist of four text fields: one to enter your name, one to enter an email address, one to enter a URL (as a pointer to your own blog, for example) and one for the comment itself. The decent spam comments use the URL field to inject the spamvertised URL to the comment. The comment itself is short, such as "Good post!" or "Nice comments, guys". That's the rather smart method. example
- The obstrusive spam comments:
The comment itself consists of a list of keywords, with each keyword being a link to a spamvertised site. That's the brute force method. example
- Obstrusive-gone-smarter spam comments:
Actually that's a modification of the simple obstrusive spam comments. It's not too common yet, but it can be assumed that this type will increase as soon as blogs start to use bayesian filtering. Keyword-Link-combinations get embedded in a real world text that the spammer pastes from any normal website. The goal is to fool with naive bayesian filters with this, rendering their decision tables useless sooner or later. example
No matter if your blog has been hit by the smart or brute force method, you should take care to remove these comments as soon as possible. The reason: it looks like spam in a blog attracts other spammers, as has been reported recently by the Kelsey Consulting Group.
A possible solution: blacklists
Bloggers around the world have come up with a huge list of ideas that could help in fighting and preventing future comment spam. Most of them either have drawbacks on the usability of a blog (from the readers' point of view), or can be easily circumvented by the spammers.
In our eyes the most promising technique is blacklisting. This means to have lists of "items" that you look for in every comment that gets posted. The most useful items to look out for in context of blog spamming are keywords and URLs. Both items are essential for blog spam to work, and spammers can't easily (if at all) come by this fact.
A blacklist consists of a list of matching patterns. If a pattern results in a match in a comment, this comment is regarded to be spam. You can see an example of such a blacklist here, this is the master list of the MT-blacklist plugin.
Blacklists are not free of caveats, either. First of all, they need to be up-to-date in order to work proper. In other words: if you don't regularly update your matching patterns, you probably will miss comments that belong to a new spam wave. You also need to be sure that the patterns don't accidently match wanted comments, thus resulting in "false positives". And in order to protect yourself from false patterns that possibly spammers will try to inject, you need trustworthy sources for your reference (that's what is called "web of trust").
The goals of blam
Strictly speaking the blam project is a "two in one" thing:
- Part 1 of the project deals with the design of formats and protocols for data exchange. Exchanging data is necessary to build a "web of trust" for comment spam blacklists. Hopefully we will sometimes reach the point where our work is becoming something like a pseudo-standard in the blogging community.
- Part 2 of the project will provide a reference implementation of the formats and protocols that are result of the work of "part 1". This should help authors of blog-ware to speed up implementation of blacklist support, as the management aspect of "blacklisting" is then already available as a standalone solution that runs independant of the actual used blog solution. This tool will help blog owners to deal with all the management tasks, such as correlating their own blacklists with trusted blacklists of other bloggers.
Who can participate in the project?
Virtually anyone is welcome to be part of the project team (except those scallywags that are spamming our blogs). It doesn't matter if you have programming experience or not - you could help by contributing ideas to discussions, by writing and correcting documentation and testing the software. Interested? Then join us!
Who is behind this all?
The project has been founded by Michael "Mike" Renzmann (also (un)known as otaku42). He is quite new to the area of blogging, but nevertheless his blog has been hit several times by spam comments. Since then he started to contribute code to WordPress and had the idea of a standalone blacklist manager. He hopes that his project will be joined by some of the "great names" of the scene who fight against comment spam - Jay Allen, author of the blacklist plugin for Movable Type, is one of those who already joined. :)
About the name and the slogan
"blam" is just the first characters of "blacklist manager", which is the main meaning for the name. The second is inspired by comics...
Just blam. Like the sound of a large trout that get's whacked right into the face of a spamming moron. You see?
The slogan "thwarting blog spamming scallywags" has been invented on the fly by snowchyld and biznatchimon. Mike went into #joiito (a channel in the freenode IRC network), asking for help with english language related to a possible slogan. Thanks again to you both :)