AlanCrowe comments on The flawed Turing test: language, understanding, and partial p-zombies

AlanCrowe 17 May 2013 21:14 UTC
23 points
The post doesn’t do justice to the subtlety of Turing’s insight. The Turing test is two-faced in that the interrogator is addressing two contestants, the computer and the human. He doesn’t know which is which, but he hopes that comparing their answers will reveal their identities. But the Turing test is two-faced in a second way.

Turing hopes that the test will satisfy its audience, but that audience contains two groups. There is a pro-AI group. Some of them will have been involved in writing the initial source code of the AI that is taking the test. They are cheering on the AI. Then there is the anti-AI group, staunchly maintaining that computers cannot think. They admire the trickery of the programmers, but refuse to credit the creation with the thoughts of its creators.

Consider a conventional test that resembles a university examination. Perhaps the computer scores high marks. The anti-AI refuses to budge. The coders have merely hired experts in the subject being examined and laboured hard to construct a brittle facade of apparent knowledge. Let us change the curriculum,...

But a conventional test has both failure modes. If the computer scores low marks the pro-AI crowd will refuse to budge. The test was too hard and they were not given enough time to prepare. A human student would cope as poorly if you switched the curriculum on him,...

Turing tried to come up with a test that could compel die-hard in both camps. First he abolishes the curriculum. The interrogator is free to ask whatever questions he wishes. There is no point teaching to the test, for the question “Will this be on the test?” receives no answer. Second he abolishes the pass mark. How well does the computer have to do? As well as a human. And how well is that? We don’t know; a human will take the test at the same time as the computer and the interrogator will not know which is which, unless the incompetence of the computer gives the game away.

The pro-AI camp are between a rock and a hard place. They cannot complain about the lack of a curriculum for the human doesn’t get a copy of it either: it doesn’t exist. They cannot complain that the questions were too hard, because the human answered them. They cannot complain that the human’s answers were merely a good effort but actually wrong, because they were good enough to let the interrogator recognise human superiority.

The final gambit of the pro-AI camp is to keep the test short. Perhaps the interrogator has some killer questions that will sort the humans from the computers, but he has used them before and the programmers have coded up some canned answers. Keep the test short. If the interrogator starts asking follow up questions, probing to see if those were the computer’s own answers, probing to see if the computer understands the things it is saying or reciting from memory,...

We come to a tricky impasse. Just how long does the interrogator get?

Perhaps it is the anti-AI crowd that is having a hard time. The computer and the human are both giving good answers to the easy questions. No help there. The computer and the human are both struggling to answer the hard questions. No help there. The medium questions are producing different answers from the two contestants, but sometimes teletype A hammers out a human answer and teletype B tries to dodge, and sometimes its the other way round.

There is one fixed point on the non-existent curriculum, childhood. Tell me about your mother, tell me about your brother. The interrogator learns anew the perils of a fixed curriculum. Teletype A has a good cover story. The programmers have put a lot of work into constructing a convincing fiction. Teletype B has a good cover story. The programmers have put a lot of work into construction a convincing fiction. Which one should the interrogator denounce as non-human. The interrogator regrets wasting half the morning on family history. Fearing embarrassment he pleads for more time.

The pro-AI camp smirk and say “Of course. Take all the time you need.”. After the lunch break the interrogation resumes. After the dinner break the interrogation resumes. The lights go on. People demand stronger coffee as 11pm approaches. Teletype B grows tetchy. “Of course I’m the human, you moron. Why can’t you tell? You are so stupid.” The interrogator is relieved. He has coded chat bots himself. On of his last ditch defenses was
```
(defun insult-interrogator () (format *standard-io* "~&You are so stupid."))
```
He denounces B as non-human, getting it wrong for the fourth time this week. The computer sending to teletype A has passed the Turing test :-)

Whoops! I’m getting carried away writing fiction. The point I’m trying to tack on to Turing’s original insight (no curriculum, no pass mark) is that the pro-AI camp cannot try to keep the test short. If they limit it to a 5 minute interrogation, the anti-AI camp will claim that it takes six minutes to exhaust the chat bots opening book, and refuse to concede.

More importantly the anti-AI camp can develop the technique of smoking out small-state chat-bots by keeping the interrogation going for half an hour and then circling back to the beginning. Of course the human may have forgotten how the interrogation began. It is in the spirit of the test to say the the computer doesn’t have to do better than the human. But the spirit of the Turing test certainly allows the interrogator to try. If the human notices “Didn’t you ask that earlier.” and if the computer doesn’t, or slows down as the interrogation proceeds due to an ever-growing state, the computer quite properly fails the Turing Test. (Hmm, I feel that I’m getting sucked into a very narrow vision of what might be involved in passing the Turing Test.)

If the pro-AI camp want the anti-AI camp to concede, they have to let the anti-AI interrogators keep asking questions until they realise that the extra questions are not helping. The computer is thinking about the questions before answering and can keep it up all day.

I think that you can break a chat-bot out of its opening book with three questions along the following lines.

1)Which is heavier, my big toe or a 747

2)Which is heavier, a symphony or a sonnet

3a)Which question do you think is better for smoking out the computer, the first or the second?

3b)Which of the previous two questions is the more metaphorical?

One can imagine a big engineering effort that lets the computer identify objects and estimate their weight. Big toe 10 grams. 747, err, 100 tons. And one can write code that spots and dodges trick questions involving the weight of immaterial objects. But one needs a big, fat opening book to cope with the great variety of individual questions that the interrogator might ask.

Then comes question three. That links together question one and question two, squaring the size of the opening book. 40 seconds into an all day interrogation and the combinatorial explosion has already gone BOOM!
- Kindly 17 May 2013 23:36 UTC
  10 points
  Parent
  Well, those used to be the three questions we asked, but now you’ve gone and ruined the Turing test for everyone. Way to go.
- SilasBarta 18 May 2013 3:03 UTC
  4 points
  Parent
```
format *standard-io*
```
  Er, if you’re smart enough to a) write a Turing Test solver b) that’s used in “production” c) in Lisp d) because you’re most comfortable in Lisp …
  
  Don’t you think you would have factored out such a commonly-used conversation primitive to the point that it doesn’t require two keywords (one of them decorated) to invoke?
  
  I know, a nitpick, but it kinda stood out :-P
- VCM 19 May 2013 19:58 UTC
  2 points
  Parent
  The combinatorial explosion is on the side of the TT, of course. But storage space is on the side of “design to the test”, so if you can make up a nice decisive question, the designer can think of it, too (or read your blog) and add that. The question here is whether Stuart (and Ned Block) are right that such a “giant lookup table” a) makes sense and b) has no intelligence. “The intelligence of a toaster” as Block said.
  - AlanCrowe 19 May 2013 20:47 UTC
    10 points
    Parent
    One thing that I’ve tried with Google is using it to write stories. Start by searching on “Fred was bored and”. Pick slightly from the results and search on “was bored and slightly”. Pick annoyed from the search results and search on “bored and slightly annoyed”
    
    Trying this again just now reminds me that I let the sentence fragment grow and grow until I was down to, err, ten? hits. Then I took the next word from a hit that wasn’t making a literal copy, and deleted enough leading words to get the hit count back up.
    
    Anyway, it seemed unpromising because the text lacked long range coherence. Indeed, the thread of the sentences rarely seemed to run significantly longer than the length of the search string.
    
    Perhaps “unpromising” is too harsh. If I were making a serious Turing Test entry I would happily use Big Data and mine the web for grammar rules and idioms. On the other hand I feel the need for a new and different idea for putting some meaning and intelligence behind the words. Otherwise my chat bot would only be able to compete with humans who were terribly, terribly drunk and unable to get from one end a sentence to the other kind of cricket match where England collapses and we lose the ashes on the way back from the crematorium, which really upset the, make mine a pint, now where was I?
    - Viliam_Bur 21 May 2013 8:03 UTC
      3 points
      Parent
      Essentially, you tried to make a Markov-chain story generator. Yes, it generates this type of texts, where short fragments look like parts of meaningful text, but a longer text reveals that it has no sense.
      
      Seems to me that there is a mental illness (but I don’t remember which one it is) where people generate the same kind of speech. Not sure what are the philosophical consequences for the Turing test, though.
      - Nornagest 21 May 2013 8:12 UTC
        7 points
        Parent
        
        Seems to me that there is a mental illness (but I don’t remember which one it is) where people generate the same kind of speech.
        
        You’re probably thinking of word salad, most often associated with schizophrenia but not exclusive to it.
  - Stuart_Armstrong 20 May 2013 9:02 UTC
    0 points
    Parent
    Yep. A genuine giant lookup table would be unfeasibly huge—but it might well be intelligent.
    
    It would count as “intelligent” if it had general skills—say the skill to construct long-terms plans that actually worked (as opposed to sounding good in conversation).