Helping to scan books by fighting spam
Tuesday, November 20th, 2007
After my new website was online, it took spammers only 4 days to find it and flood it with comment spam.
So I decided to implement a captcha. But, to turn anti-spam measures into something useful, I decided to try out reCAPTCHA. This project helps archive.org with their effort to scan books. OCR is not 100% accurate, and reCAPTCHA helps to decode the words that the computer can't decode, but humans can probably read.
What the captcha basically does is present you 2 words: one that it knows how to decode, and one that it was unable to read. If you got one word right, it's going to assume that you also know how to read the other, and will use your interpretation for the book scan. To prevent abuse and improve quality, it can send the same word to multiple people.
I haven't used it much myself, and encountered it only on a few sites. If you are having trouble with commenting on my site because of this captcha, let me know. (How can you let me know if you can't comment? You probably know my e-mail address and if you don't, you can easily guess.
)
If you like the way it works and want to support the project, add it to your own site. If you use WordPress as I do, it's very easy to setup: you can download a WordPress plugin here.


