There is a problem with the Turing test, practically and philosophically, and I would be willing to bet that the first entity to past the test will not be conscious,
First, you mean “pass”, not “past”, I assume. Second, what definition of “conscious” are you using here? If you reject p-zombies, then it has to be behavioral. From this post it seems like this behavior is “able to pass a secret Turing test most actual humans can pass”. which is not a great definition, given how many actual humans fail the existing Turing test miserably.
given how many actual humans fail the existing Turing test miserably.
Are you referring to some specific thing here, or is this a general “people are dumb” sort of thing? If it’s the former, elaborate please, this sounds interesting!
Interestingly enough, I find numerous references to Clay and this incident, but very little in the way of the transcript it refers to. This was the best I could find (she is “Terminal 4”):
Judge 9: Are you familiar with Hamlet?
Terminal 4: The college kid who came home and found his mom had married the guy who murdered his dad just a little month before? You might say so.
[...]
Judge 1: What is your opinion on Shakespeare’s plays?
Terminal 4: That’s pretty general; would you be more specific? Otherwise, I’ll just say I like them.
Judge 1: Learning that you like them answers my question. Which of his plays is your favorite?
Terminal 4: Well, let’s see...Pericles.
Judge 1: Why is it your favorite?
Terminal 4: Because he was obviously called in to play-doctor somebody’s awful script. It was one of the few (maybe only two?) plays written with somebody else. It’s really rather an ucky play. What play do you like?
Judge 1: I did not understand your response. However, to answer your question, I do not have a favorite.
She does sound somewhat unhumanly to me in the latter excerpt you quoted. (But then again, so do certain Wikipedia editors that I’m pretty sure are human.)
It happens once in a while in online chat, where real people behave like chatterbots. Granted, with some probing it is possible to tell the difference, if the suspect stays connected long enough, but there were cases where I was not quite sure. In one case the person (I’m fairly sure it was a person) posted links to their own site and generic sentences like “there are many resources on ”. In another case the person was posting online news snippets and reacted with hostility to any attempts to engage, which is a perfect way to discourage Turing probing if you design a bot.
Imagine a normal test, perhaps a math test in a classroom. Someone knows math but falls asleep and doesn’t answer any of the questions. As a result, they fail the test. Would you say that the test can’t detect whether someone can do math?
Technically, that’s correct. The test doesn’t detect whether someone can do math; it detects whether they are doing math at the time. But it would be stupid to say “hey, you told me this tests if someone can do math! It doesn’t do that at all!” The fact that they used the words “can do” rather than “are doing at the time” is just an example of how human beings don’t use language like a machine, and objecting to that part is pointless.
Likewise, the fact that someone who acts like a chatterbot is detected as a computer by the Turing test does mean that the test doesn’t detect whether someone is a computer. It detects whether they are acting computerish at the time. But “this test detects whether someone is a computer” is how most people would normally describe that, even if that’s not technically accurate. It’s pointless to object that the test doesn’t detect whether someone is a computer on those grounds.
It doesn’t even test whether someone’s doing math at the time. I could be doing all kinds of math and, in consequence, fail the exam.
I would say, rather, that tests generally have implicit preconditions in order for interpretations of their results to be valid.
Standing on a scale is a test for my weight that presumes various things: that I’m not carrying heavy stuff, that I’m not being pulled away from the scale by a significant force, etc. If those presumptions are false and I interpret the scale readings normally, I’ll misjudge my weight. (Similarly, if I instead interpret the scale as a test of my mass, I’m assuming a 1g gravitational field, etc.)
Taking a math test in a classroom makes assumptions about my cognitive state—that I’m awake, trying to pass the exam, can understand the instructions, don’t have a gerbil in my pants, and so forth.
Surely it is clear what the Turing test is measuring. It is measuring the ability to pass for a human under certain conditions.
A better question is whether (and in what way) does the ability to pass for a human correlate with other qualities of interest, notably ones which we vaguely describe as “intelligent” or “conscious”.
does the ability to pass for a human correlate with [qualities] which we vaguely describe as “intelligent” or “conscious”[?]
I always thought (and was very convinced in my belief, though I can’t seem to think of a reason why now) that the Turing test was explicitly designed as a “sufficient” rather than a “necessary” kind of test. As in, you don’t need to pass it to be “human-level”, but if you do then you certainly are. (Or, more precisely, as long as we’ve established we can’t tell, then who cares? With a similar sentiment for exactly what it was we’re comparing for “human-level”: it’s something about how smarter we are than monkeys, we’re not sure quite what it is, but we can’t tell the difference, so you’re in.) A brute-force, first-try, upper-bound sort of test.
But I get the feeling from some of the comments that it claims more than that (or maybe doesn’t disclaim as much). Am I missing some literature or something?
I personally agree with your comment (assuming I understand it correctly). As far as I can tell, however, some people believe that merely being able to converse with humans on their own level is not sufficient to establish the agent’s ability to think on the human level. I personally think this belief is misguided, since it privileges implementation details over function, but I could always be wrong.
A better question is whether (and in what way) does the ability to pass for a human correlate with other qualities of interest, notably ones which we vaguely describe as “intelligent” or “conscious”.
Is there any way we can test for consciousness without using some version of the Turing Test ? If the answer is “no”, then I don’t see the point of caring about it.
As for “intelligence”, it’s a little trickier. There could be agents out there who are generally intelligent yet utterly inhuman. The Turing Test would not, admittedly, apply to them.
I am not sure what you mean by “optimized on”. What if we made an AI that was really good at both chatting and playing music ? It could pass your extended test then (while many humans, such as f.ex. myself would fail). Now what ?
Then I’d test it on 3d movements. The point is that these tests have great validity as test for general intelligence (or something in the vicinity), if the programmer isn’t deliberately optimising or calibrating their machine on.
If you’d designed a chatterbot and it turned out to be great at playing music (and that wasn’t something you’d put in by hand), then that would be strong evidence for general intelligence.
The deliberate optimization on the part of a designer is just an example of the sort of thing you are concerned about here, right? That is, if I used genetic algorithms to develop a system X, and exposed those algorithms to a set of environments E, X would be optimized for E and consequently any test centered on E (or any subset of it) would be equally unreliable as a test of general intelligence… the important thing is that because X was selected (intentionally or otherwise) to be successful at E, the fact that X is successful at E ought not be treated as evidence that X is generally intelligent.
Yes?
Similarly, the fact that X is successful at tasks not actually present in E, but nevertheless very similar to tasks present in E, ought not be treated as evidence that X is generally intelligent. A small amount of generalization from initial inputs is not that impressive.
The question then becomes how much generalization away from the specific problems presented in E is necessary before we consider X generally intelligent.
To approach the question differently—there are all kinds of cognitive tests which humans fail, because our cognitive systems just weren’t designed to handle the situations those tests measure, because our ancestral environment didn’t contain sufficiently analogous situations. At what point do we therefore conclude that humans aren’t really generally intelligent, just optimized for particular kinds of tests?
First, you mean “pass”, not “past”, I assume. Second, what definition of “conscious” are you using here? If you reject p-zombies, then it has to be behavioral. From this post it seems like this behavior is “able to pass a secret Turing test most actual humans can pass”. which is not a great definition, given how many actual humans fail the existing Turing test miserably.
Are you referring to some specific thing here, or is this a general “people are dumb” sort of thing? If it’s the former, elaborate please, this sounds interesting!
I highly recommend his book Most Human Human, for an interesting perspective on how (most) humans pass the Turing Test.
Interestingly enough, I find numerous references to Clay and this incident, but very little in the way of the transcript it refers to. This was the best I could find (she is “Terminal 4”):
She does sound somewhat unhumanly to me in the latter excerpt you quoted. (But then again, so do certain Wikipedia editors that I’m pretty sure are human.)
Fascinating! Thank you, this is definitely going on my reading list.
It happens once in a while in online chat, where real people behave like chatterbots. Granted, with some probing it is possible to tell the difference, if the suspect stays connected long enough, but there were cases where I was not quite sure. In one case the person (I’m fairly sure it was a person) posted links to their own site and generic sentences like “there are many resources on ”. In another case the person was posting online news snippets and reacted with hostility to any attempts to engage, which is a perfect way to discourage Turing probing if you design a bot.
Imagine a normal test, perhaps a math test in a classroom. Someone knows math but falls asleep and doesn’t answer any of the questions. As a result, they fail the test. Would you say that the test can’t detect whether someone can do math?
Technically, that’s correct. The test doesn’t detect whether someone can do math; it detects whether they are doing math at the time. But it would be stupid to say “hey, you told me this tests if someone can do math! It doesn’t do that at all!” The fact that they used the words “can do” rather than “are doing at the time” is just an example of how human beings don’t use language like a machine, and objecting to that part is pointless.
Likewise, the fact that someone who acts like a chatterbot is detected as a computer by the Turing test does mean that the test doesn’t detect whether someone is a computer. It detects whether they are acting computerish at the time. But “this test detects whether someone is a computer” is how most people would normally describe that, even if that’s not technically accurate. It’s pointless to object that the test doesn’t detect whether someone is a computer on those grounds.
It doesn’t even test whether someone’s doing math at the time. I could be doing all kinds of math and, in consequence, fail the exam.
I would say, rather, that tests generally have implicit preconditions in order for interpretations of their results to be valid.
Standing on a scale is a test for my weight that presumes various things: that I’m not carrying heavy stuff, that I’m not being pulled away from the scale by a significant force, etc. If those presumptions are false and I interpret the scale readings normally, I’ll misjudge my weight. (Similarly, if I instead interpret the scale as a test of my mass, I’m assuming a 1g gravitational field, etc.)
Taking a math test in a classroom makes assumptions about my cognitive state—that I’m awake, trying to pass the exam, can understand the instructions, don’t have a gerbil in my pants, and so forth.
Oops! Now corrected.
I’m not using one. Part of the problem is that the Turing test is measuring something, but it’s not entirely clear what.
Surely it is clear what the Turing test is measuring. It is measuring the ability to pass for a human under certain conditions.
A better question is whether (and in what way) does the ability to pass for a human correlate with other qualities of interest, notably ones which we vaguely describe as “intelligent” or “conscious”.
I always thought (and was very convinced in my belief, though I can’t seem to think of a reason why now) that the Turing test was explicitly designed as a “sufficient” rather than a “necessary” kind of test. As in, you don’t need to pass it to be “human-level”, but if you do then you certainly are. (Or, more precisely, as long as we’ve established we can’t tell, then who cares? With a similar sentiment for exactly what it was we’re comparing for “human-level”: it’s something about how smarter we are than monkeys, we’re not sure quite what it is, but we can’t tell the difference, so you’re in.) A brute-force, first-try, upper-bound sort of test.
But I get the feeling from some of the comments that it claims more than that (or maybe doesn’t disclaim as much). Am I missing some literature or something?
I personally agree with your comment (assuming I understand it correctly). As far as I can tell, however, some people believe that merely being able to converse with humans on their own level is not sufficient to establish the agent’s ability to think on the human level. I personally think this belief is misguided, since it privileges implementation details over function, but I could always be wrong.
IIRC, Turing introduces the concept in the paper as a sufficient but not necessary condition, as you describe here.
I feel it may be neither necessary nor sufficient. It would be a pretty strong indication, but wouldn’t be enough on its own.
Yes, that’s the issue.
Is there any way we can test for consciousness without using some version of the Turing Test ? If the answer is “no”, then I don’t see the point of caring about it.
As for “intelligence”, it’s a little trickier. There could be agents out there who are generally intelligent yet utterly inhuman. The Turing Test would not, admittedly, apply to them.
We could use those extended versions of the Turing tests I mentioned—anything that the computer hasn’t been specifically optimised on would work.
I am not sure what you mean by “optimized on”. What if we made an AI that was really good at both chatting and playing music ? It could pass your extended test then (while many humans, such as f.ex. myself would fail). Now what ?
Then I’d test it on 3d movements. The point is that these tests have great validity as test for general intelligence (or something in the vicinity), if the programmer isn’t deliberately optimising or calibrating their machine on.
If you’d designed a chatterbot and it turned out to be great at playing music (and that wasn’t something you’d put in by hand), then that would be strong evidence for general intelligence.
The deliberate optimization on the part of a designer is just an example of the sort of thing you are concerned about here, right? That is, if I used genetic algorithms to develop a system X, and exposed those algorithms to a set of environments E, X would be optimized for E and consequently any test centered on E (or any subset of it) would be equally unreliable as a test of general intelligence… the important thing is that because X was selected (intentionally or otherwise) to be successful at E, the fact that X is successful at E ought not be treated as evidence that X is generally intelligent.
Yes?
Similarly, the fact that X is successful at tasks not actually present in E, but nevertheless very similar to tasks present in E, ought not be treated as evidence that X is generally intelligent. A small amount of generalization from initial inputs is not that impressive.
The question then becomes how much generalization away from the specific problems presented in E is necessary before we consider X generally intelligent.
To approach the question differently—there are all kinds of cognitive tests which humans fail, because our cognitive systems just weren’t designed to handle the situations those tests measure, because our ancestral environment didn’t contain sufficiently analogous situations. At what point do we therefore conclude that humans aren’t really generally intelligent, just optimized for particular kinds of tests?