clarification/amplification on spam comment interdiction

In response to this

I like the second of the two, since it’s not exclusionary. How hard would it be to defeat? And what countermeasures could be written into it (present the string as HTML entities that have to be decoded by a parser? present the word reversed? don’t use real words at all? make the letter position the result of a simple equation [what letter is in the 2^2 position in the string uiwplkg?]?)

a friend writes:

None of those countermeasures would be effective against a computer parser; most of that stuff doesn’t even matter to a computer, like whether it’s a real word or if there’s an expression to evaluate. That’s all stuff that a computer is really good at.

On the other hand, you could describe the operation to be performed in such a way that it’s hard to get the gist without fully grokking the English:

“In an attempt to verify that you are a living, breathing human being and not a mindless computer program, and not having the time or resources to arrange a Turing Test, we would like you to enter, in the blank below, the answer indicated by the following paragraph.

The previous paragraph contains words of several parts of speech: prepositions, articles, nouns, verbs, pronouns, adjectives. Locate the first word which belongs in that last category and enter the letter which appears in it twice.

Enter answer here:[ ]”

Of course, that’s a bad example because it exclude people who were ignorant of the intricacies of English parts of speech (when is a verb form an adjective?). But it’s the right kind of example, I feel. The idea is to make the statement of the problem as hard to parse as possible. Avoid using digits; spell out numbers and require them to be spelled out. Pull together various parts of the text with references that are unambiguous but not computationally precise. That sort of thing. And of course, there has to be a very, very large set of potential problems and answers so that it doesn’t boil down to capturing all of the questions and memorizing the correct responses with no grokkage required at all.

So randomness (to create a large problem set) and high degree of difficulty in parsing, in effect negating simple parsing, are the specifications.

[Posted with ecto]