The classical problem is that the Turing Test is behavioristic and only provides a sufficient criterion (rather than replacing talk about ‘intelligence’ as Turing suggests). And it doesn’t provide a proper criterion in that it relies on human judges—who tend to take humans for computers, in practice. - Of course it is meant to be open-ended in that “anything one can talk about” is permitted, including stuff that’s not on the web. That is a large set of intelligent behavior, but a limited set—so the “design to the test” you are pointing out is precisely what chatterbot people use. And it’s usually pretty dumb and provides no insight into human flexibility in using language (which can be used for more than “talking about stuff”). I also suspect that we’ll have passing the test pretty soon, in the sense of non-sophisticated judges. So far, results are hopeless, however! The main weakness is that the machines don’t do much analysis of the conversation so far. --- Essentially, we know from similar problems (e.g. speech recognition) that one can get very good, but somewhere in the upper 90% there is a limit that’s very hard to break without using more data.
The classical problem is that the Turing Test is behavioristic and only provides a sufficient criterion (rather than replacing talk about ‘intelligence’ as Turing suggests). And it doesn’t provide a proper criterion in that it relies on human judges—who tend to take humans for computers, in practice. - Of course it is meant to be open-ended in that “anything one can talk about” is permitted, including stuff that’s not on the web. That is a large set of intelligent behavior, but a limited set—so the “design to the test” you are pointing out is precisely what chatterbot people use. And it’s usually pretty dumb and provides no insight into human flexibility in using language (which can be used for more than “talking about stuff”). I also suspect that we’ll have passing the test pretty soon, in the sense of non-sophisticated judges. So far, results are hopeless, however! The main weakness is that the machines don’t do much analysis of the conversation so far. --- Essentially, we know from similar problems (e.g. speech recognition) that one can get very good, but somewhere in the upper 90% there is a limit that’s very hard to break without using more data.
The 90% argument is very pertinent, and may be the thing that preserves the Turing test as general intelligence test.
Or maybe not! We’ll have to see...