Fighting Spam on Typo with Logic
I realized that spam bots are stupid, and spammers are generally not the best programmers, so a while back I made a system to fight the intolerable spam plaguing us. I noted today that Robby on Rails was having this same problem, and I figured I might as well share what has worked for me.
First, I added this line to my views/articles/_comment_box.rhtml:
# <td><p><label for="comment_body">Your message</label></p></td>
# <td valign="top" colspan="2">
# <%= text_area "comment", "body" %>
# </td>
# </tr>
<tr>
<td>
<p>
<% spammer_array = [["two","9","twelve","2"][rand(4)],["1","15","4","eight"][rand(4)]] %>
<% question = "What's #{spammer_array[0]} times #{spammer_array[1]} ? (numerical)" %>
<label for="spammers_suck"><%= question %></label>
</p>
</td>
<td> <%= text_field_tag "spammers_suck" %><%= hidden_field_tag "spammers_question", question %></td>
</tr>
# <tr>
# <td colspan="2" id="frm-btns">
So far, it’s simply a new table row with some junk in it. But, the interesting thing is that every time the page is created and cached, it contains a new random equation for the user to guess. This is then sent along with the request to post a comment (not the preview, mind you) to the comment action.
# Again: Commented parts are unchanged from Typo codebase
# def comment
# unless @request.xhr? || this_blog.sp_allow_non_ajax_comments
# render_error("non-ajax commenting is disabled")
# return
# end
#AntiSpam
b = params[:spammers_question].split(" ")
c = [[2, 9, 12, 2], [1, 15, 4, 8]]
d = [["two","9","twelve","2"], ["1","15","4","eight"]]
num_one = 0
num_two = 0
c[0].each_with_index{|t,i| if(b[1].index(d[0][i])); num_one = t; end}
c[1].each_with_index{|t,i| if(b[3].index(d[1][i])); num_two = t; end}
if not params[:spammers_suck].to_i == num_one * num_two
render_text "You're either a spammer, or you can't do math."
# elsif request.post?
# begin
# @article = this_blog.published_articles.find(params[:id])
# ...
This very simple hack has caused a complete cease of comment spam on my blog. I also globally disabled trackbacks (which took a manual database query in the end), and so far the only spam-like comment I’ve gotten was a hate comment ;). So, the moral of the story is that you don’t have to put up with spam in Typo, and you don’t have to use Askimet or some other external service to fight it. Just some simple math is all it takes to pwn the noob-bots.

The problem I have with putting something exactly like this in Typo itself is that, as soon as something becomes predictable, the spammers can either program their way around it, or use mechanical turk like techniques to evade it. For instance, the classic way of evading a captcha is simply to serve the same captcha up on, say, a free porn site and ask that the user complete the captcha to see more porn, then the robot takes that answer and feeds it back to the original site and continues on its merry way.
It’s generally not worth the spammers’ while doing this for a unique captcha system, but if enough sites start using a particular scheme, then there’s more incentive for the spammer to work around it.
That said, I am thinking of how to set up an anti spam plugin architecture that will allow typo users to either design their own gatekeepers or pick some other plugin, that way, gatekeepers can continue to evolve in a way that’s decoupled from the blogging software and, hopefully there will be a much wider variety of them and the value the spammer in working around any single one of them will be dramatically reduced.
My personal roadmap for 4.x->5 is a bunch of fixes to make us more RESTful (@/admin@ must DIE) and a complete overhaul/creation of typo’s plugin structure. Expect Hpricot to crop up as either a required gem or a subdirectory of vendor, and for a bunch of what I’m currently thinking of as ’structural’ callbacks to appear. Event based callbacks are so _Wordpress_ don’t you think?
That would rock. I say do it! I was also considering beating the mechanical turks by asking about things on the site itself. That way, the person would have to be on my site to get past the spam check, which shouldn’t be a big deal.