Skirmish with Spam

I was getting tired of the comment spam on my site. When I started using Mollom half a year ago, my comment spam problem disappeared overnight. But, lately, the number of spam attempts has risen so much that even Mollom lets through enough spam to get me a bit miffed.

So, I decided to go “old school”… I was going to add math problems of increasing difficulty until the spam stopped. It was going to be a War on Spam. But, as the experiments that I describe next show, the spam stopped rather quickly, and I had to retitle the article Skirmish with Spam.

I am, by the way, fully aware that tricks like this will only work for sites that aren’t specifically targeted by the spammers. It will never work for large sites like Google, since the spammers would just manually code around it in no time… But, for the time being, I don’t think they’ll go through that much trouble for my site. I’m also aware that all of this is only valid for “my” spambots, in the sense that the spambots that happen to target my site are possibly not at all representative for spambots in general.

For this experiment, I added a new field to the comment form, “3 + 4 = ?”, with “Spam avoidance measure, sorry for this.” as the description. Leaving the field empty results in the error message “3 + 4 = ? field is required”, and a wrong answer results in “Please answer 7 in the 3 + 4 = ? field”. I wanted to make this as painless as possible for the occasional human that gets in between the ’bots.

In Drupal, which I use to run this site, you can do this by adding functions like the following to a custom module. I have a module called tr that I use to customize a few things with around here, so the function names start with tr_.

<?php
function tr_form_comment_form_alter(&$form, &$form_state, $form_id) {
  $form['question'] = array(
    '#type' => 'textfield',
    '#title' => '3 + 4 = ?',
    '#description' => 'Spam avoidance measure, sorry for this.',
    '#weight' => -1,
    '#required' => true,
  );
  $form['#validate'][] = 'tr_comment_form_validate';
}
 
function tr_comment_form_validate($form, &$form_state) {
  if ($form_state['values']['question'] != '7') {
    form_set_error('question', 'Please answer 7 in the 3 + 4 = ? field.');
  }
}
?>

The Results

As the table shows, I get tens of spam attempts per day (each column represents about 24 hours). I knew, from trying something like this in the past, that a simple sum (“3 + 4 = ?”) probably wouldn’t work. Well, in practice, it completely didn’t work. The spambots actually computed the result, since most of the answers to the anti spam question were, simply, correct. It’s not that they filled in a random number and only the sevens got through. In only one of 26 attempts was the anti spam question field simply filled with spam text.

Anti spam question None 3 + 4 = ? 3 x 4 = ? 8 / 2 = ? 8 / (1 - 1) = ?
Attempts to spam 31 26 51 66 68
Correct answer - 25 0 59 0
Incorrect (numerical) answer - 0 9 2 1
Left field empty - 0 32 0 64
Text or spam keywords - 1 10 5 3
“Accepted” comments 31 25 0 59 0

Next, I replaced the sum by “3 x 4 = ?”. I expected that the number of wrong answers would greatly increase, especially since I used the unicode multiplication character (x), instead of the asterisk (*) that is used in most programming languages. For the addition, the spambot can simply “eval” the expression in some way to arrive at the correct result. This will no longer work with the x character. And, indeed, there was not a single correct answer out of 51 attempts. More interesting is that most of the attempts left the field empty, or filled it with spam text. So, the spambots seem to notice that they can’t answer the question. Or, maybe more likely, they leave it empty because they don’t recognize what the field means. I got the answer “3” nine times, which seems to be an attempt at evaluating the expression that failed at the unknown character x. I tried the same thing with the html multiplication character (×, created with &times;), with similar results.

After quickly testing that “3 * 4 = ?” immediately resulted in correct responses, I made it a bit more difficult, and put in “3 + 4 * 2 = ?”. This also resulted in correct responses, so it seems that the spambots really do some sort of “eval” of the expression. Next, I tried “8 / 2 = ?” for a longer test, again resulting in mostly correct responses. The spambots even correctly solve “8 / (1 + 1) = ?”. They can really handle general expressions.

I then tried to have the spambots solve “8 / 0 = ?” for me, and even “8 / (1 - 1) = ?”, but that didn’t seem to crash them… They still submitted the form, leaving the field empty most of the time. So they seem to handle exceptions correctly.

I’ve now put in “3 x 4 = ?” again, and I don’t get any spam anymore. Now, if some of you copy this idea, you’ll get rid of your spam problem. If all of you do this, the spambots will be adapted very quickly. What do you think?

[update] I’ve added the Drupal code snippet to make this article more practically useful.

Tags

Nassia (not verified)

Sat, 12/22/2012 - 13:24

Simple and clever, very elegant solution to an annoying problem... Let's hope not all of us pick it up though :) Bonus points for the word "miffed" and substituting the word skirmish when a full-blown war was just a bit too much :)

Luc (not verified)

Mon, 12/24/2012 - 12:47

At least some bots use slightly more logic than just eval. Using similar logic as yourself, we asked "Are you human? How much is seven plus one?" and at least one spambot always gets through. I've changed the question to "How much is one more than &#55;". Let's see how it goes...

I had also tried to use lots of html character entities, by replacing “3 + 4 = ?” with “&#51;&#32;&#43;&#32;&#52;&#32;&#61;&#32;&#63;”, but that didn't help at all…

Add new comment

The content of this field is kept private and will not be shown publicly.
Spam avoidance measure, sorry for this.

Restricted HTML

  • Allowed HTML tags: <a href hreflang> <em> <strong> <cite> <blockquote cite> <code> <ul type> <ol start type> <li> <dl> <dt> <dd> <h2 id> <h3 id> <h4 id> <h5 id> <h6 id>
  • Lines and paragraphs break automatically.
  • Web page addresses and email addresses turn into links automatically.
Submitted on 22 December 2012