The flawed Turing test: language, understanding, and partial p-zombies
There is a problem with the Turing test, practically and philosophically, and I would be willing to bet that the first entity to pass the test will not be conscious, or intelligent, or have whatever spark or quality the test is supposed to measure. And I hold this position while fully embracing materialism, and rejecting p-zombies or epiphenomenalism.
The problem is Campbell’s law (or Goodhart’s law):
The more any quantitative
socialindicator is used forsocialdecision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt thesocialprocesses it is intended to monitor.”
This applies to more than social indicators. To illustrate, imagine that you were a school inspector, tasked with assessing the all-round education of a group of 14-year old students. You engage them on the French revolution and they respond with pertinent contrasts between the Montagnards and Girondins. Your quizzes about the properties of prime numbers are answered with impressive speed, and, when asked, they can all play quite passable pieces from “Die Zauberflöte”.
You feel tempted to give them the seal of approval… but they you learn that the principal had been expecting your questions (you don’t vary them much), and that, in fact, the whole school has spent the last three years doing nothing but studying 18th century France, number theory and Mozart operas—day after day after day. Now you’re less impressed. You can still conclude that the students have some technical ability, but you can’t assess their all-round level of education.
The Turing test functions in the same way. Imagine no-one had heard of the test, and someone created a putative AI, designing it to, say, track rats efficiently across the city. You sit this anti-rat-AI down and give it a Turing test—and, to your astonishment, it passes. You could now conclude that it was (very likely) a genuinely conscious or intelligent entity.
But this is not the case: nearly everyone’s heard of the Turing test. So the first machines to pass will be dedicated systems, specifically designed to get through the test. Their whole setup will be constructed to maximise “passing the test”, not to “being intelligent” or whatever we want the test to measure (the fact we have difficulty stating what exactly the test should be measuring shows the difficulty here).
Of course, this is a matter of degree, not of kind: a machine that passed the Turing test would still be rather nifty, and as the test got longer, and more complicated, as the interactions between subject and judge got more intricate, our confidence that we were facing a truly intelligence machine would increase.
But degree can go a long way. Watson won on Jeopardy without exhibiting any of the skills of a truly intelligent being—apart from one: answering Jeopardy questions. With the rise of big data and statistical algorithms, I would certainly rate it as plausible that we could create beings that are nearly perfectly conscious from a (textual) linguistic perspective. These “super-chatterbots” could only be identified as such with long and tedious effort. And yet they would demonstrate none of the other attributes of intelligence: chattering is all they’re any good at (if you ask them to do any planning, for instance, they’ll come up with designs that sound good but fail: they parrot back other people’s plans with minimal modifications). These would be the closest plausible analogues to p-zombies.
The best way to avoid this is to create more varied analogues of the Turing test—and to keep them secret. Just as you keep the training set and the test set distinct in machine learning, you want to confront the putative AIs with quasi-Turing tests that their designers will not have encountered or planed for. Mix up the test conditions, add extra requirements, change what is being measured, do something completely different, be unfair: do things that a genuine intelligence would deal with, but an overtrained narrow statistical machine couldn’t.
- What does GPT-3 understand? Symbol grounding and Chinese rooms by 3 Aug 2021 13:14 UTC; 40 points) (
- Research interests I don’t currently have time to develop alone by 16 Oct 2013 10:31 UTC; 27 points) (
- Hedonium’s semantic problem by 9 Apr 2015 11:50 UTC; 22 points) (
- Come up with better Turing Tests by 10 Jun 2014 10:47 UTC; 20 points) (
- Crude measures by 27 Mar 2015 15:44 UTC; 16 points) (
- General intelligence test: no domains of stupidity by 21 May 2013 16:04 UTC; 15 points) (
- Other minds and bats: the vampire Turing test by 25 Mar 2014 13:36 UTC; 1 point) (
- [News] Turing Test passed by 9 Jun 2014 8:14 UTC; -1 points) (
The post doesn’t do justice to the subtlety of Turing’s insight. The Turing test is two-faced in that the interrogator is addressing two contestants, the computer and the human. He doesn’t know which is which, but he hopes that comparing their answers will reveal their identities. But the Turing test is two-faced in a second way.
Turing hopes that the test will satisfy its audience, but that audience contains two groups. There is a pro-AI group. Some of them will have been involved in writing the initial source code of the AI that is taking the test. They are cheering on the AI. Then there is the anti-AI group, staunchly maintaining that computers cannot think. They admire the trickery of the programmers, but refuse to credit the creation with the thoughts of its creators.
Consider a conventional test that resembles a university examination. Perhaps the computer scores high marks. The anti-AI refuses to budge. The coders have merely hired experts in the subject being examined and laboured hard to construct a brittle facade of apparent knowledge. Let us change the curriculum,...
But a conventional test has both failure modes. If the computer scores low marks the pro-AI crowd will refuse to budge. The test was too hard and they were not given enough time to prepare. A human student would cope as poorly if you switched the curriculum on him,...
Turing tried to come up with a test that could compel die-hard in both camps. First he abolishes the curriculum. The interrogator is free to ask whatever questions he wishes. There is no point teaching to the test, for the question “Will this be on the test?” receives no answer. Second he abolishes the pass mark. How well does the computer have to do? As well as a human. And how well is that? We don’t know; a human will take the test at the same time as the computer and the interrogator will not know which is which, unless the incompetence of the computer gives the game away.
The pro-AI camp are between a rock and a hard place. They cannot complain about the lack of a curriculum for the human doesn’t get a copy of it either: it doesn’t exist. They cannot complain that the questions were too hard, because the human answered them. They cannot complain that the human’s answers were merely a good effort but actually wrong, because they were good enough to let the interrogator recognise human superiority.
The final gambit of the pro-AI camp is to keep the test short. Perhaps the interrogator has some killer questions that will sort the humans from the computers, but he has used them before and the programmers have coded up some canned answers. Keep the test short. If the interrogator starts asking follow up questions, probing to see if those were the computer’s own answers, probing to see if the computer understands the things it is saying or reciting from memory,...
We come to a tricky impasse. Just how long does the interrogator get?
Perhaps it is the anti-AI crowd that is having a hard time. The computer and the human are both giving good answers to the easy questions. No help there. The computer and the human are both struggling to answer the hard questions. No help there. The medium questions are producing different answers from the two contestants, but sometimes teletype A hammers out a human answer and teletype B tries to dodge, and sometimes its the other way round.
There is one fixed point on the non-existent curriculum, childhood. Tell me about your mother, tell me about your brother. The interrogator learns anew the perils of a fixed curriculum. Teletype A has a good cover story. The programmers have put a lot of work into constructing a convincing fiction. Teletype B has a good cover story. The programmers have put a lot of work into construction a convincing fiction. Which one should the interrogator denounce as non-human. The interrogator regrets wasting half the morning on family history. Fearing embarrassment he pleads for more time.
The pro-AI camp smirk and say “Of course. Take all the time you need.”. After the lunch break the interrogation resumes. After the dinner break the interrogation resumes. The lights go on. People demand stronger coffee as 11pm approaches. Teletype B grows tetchy. “Of course I’m the human, you moron. Why can’t you tell? You are so stupid.” The interrogator is relieved. He has coded chat bots himself. On of his last ditch defenses was
He denounces B as non-human, getting it wrong for the fourth time this week. The computer sending to teletype A has passed the Turing test :-)
Whoops! I’m getting carried away writing fiction. The point I’m trying to tack on to Turing’s original insight (no curriculum, no pass mark) is that the pro-AI camp cannot try to keep the test short. If they limit it to a 5 minute interrogation, the anti-AI camp will claim that it takes six minutes to exhaust the chat bots opening book, and refuse to concede.
More importantly the anti-AI camp can develop the technique of smoking out small-state chat-bots by keeping the interrogation going for half an hour and then circling back to the beginning. Of course the human may have forgotten how the interrogation began. It is in the spirit of the test to say the the computer doesn’t have to do better than the human. But the spirit of the Turing test certainly allows the interrogator to try. If the human notices “Didn’t you ask that earlier.” and if the computer doesn’t, or slows down as the interrogation proceeds due to an ever-growing state, the computer quite properly fails the Turing Test. (Hmm, I feel that I’m getting sucked into a very narrow vision of what might be involved in passing the Turing Test.)
If the pro-AI camp want the anti-AI camp to concede, they have to let the anti-AI interrogators keep asking questions until they realise that the extra questions are not helping. The computer is thinking about the questions before answering and can keep it up all day.
I think that you can break a chat-bot out of its opening book with three questions along the following lines.
1)Which is heavier, my big toe or a 747
2)Which is heavier, a symphony or a sonnet
3a)Which question do you think is better for smoking out the computer, the first or the second?
3b)Which of the previous two questions is the more metaphorical?
One can imagine a big engineering effort that lets the computer identify objects and estimate their weight. Big toe 10 grams. 747, err, 100 tons. And one can write code that spots and dodges trick questions involving the weight of immaterial objects. But one needs a big, fat opening book to cope with the great variety of individual questions that the interrogator might ask.
Then comes question three. That links together question one and question two, squaring the size of the opening book. 40 seconds into an all day interrogation and the combinatorial explosion has already gone BOOM!
Well, those used to be the three questions we asked, but now you’ve gone and ruined the Turing test for everyone. Way to go.
Er, if you’re smart enough to a) write a Turing Test solver b) that’s used in “production” c) in Lisp d) because you’re most comfortable in Lisp …
Don’t you think you would have factored out such a commonly-used conversation primitive to the point that it doesn’t require two keywords (one of them decorated) to invoke?
I know, a nitpick, but it kinda stood out :-P
The combinatorial explosion is on the side of the TT, of course. But storage space is on the side of “design to the test”, so if you can make up a nice decisive question, the designer can think of it, too (or read your blog) and add that. The question here is whether Stuart (and Ned Block) are right that such a “giant lookup table” a) makes sense and b) has no intelligence. “The intelligence of a toaster” as Block said.
One thing that I’ve tried with Google is using it to write stories. Start by searching on “Fred was bored and”. Pick slightly from the results and search on “was bored and slightly”. Pick annoyed from the search results and search on “bored and slightly annoyed”
Trying this again just now reminds me that I let the sentence fragment grow and grow until I was down to, err, ten? hits. Then I took the next word from a hit that wasn’t making a literal copy, and deleted enough leading words to get the hit count back up.
Anyway, it seemed unpromising because the text lacked long range coherence. Indeed, the thread of the sentences rarely seemed to run significantly longer than the length of the search string.
Perhaps “unpromising” is too harsh. If I were making a serious Turing Test entry I would happily use Big Data and mine the web for grammar rules and idioms. On the other hand I feel the need for a new and different idea for putting some meaning and intelligence behind the words. Otherwise my chat bot would only be able to compete with humans who were terribly, terribly drunk and unable to get from one end a sentence to the other kind of cricket match where England collapses and we lose the ashes on the way back from the crematorium, which really upset the, make mine a pint, now where was I?
Essentially, you tried to make a Markov-chain story generator. Yes, it generates this type of texts, where short fragments look like parts of meaningful text, but a longer text reveals that it has no sense.
Seems to me that there is a mental illness (but I don’t remember which one it is) where people generate the same kind of speech. Not sure what are the philosophical consequences for the Turing test, though.
You’re probably thinking of word salad, most often associated with schizophrenia but not exclusive to it.
Yep. A genuine giant lookup table would be unfeasibly huge—but it might well be intelligent.
It would count as “intelligent” if it had general skills—say the skill to construct long-terms plans that actually worked (as opposed to sounding good in conversation).
But aren’t these just instances of the Turing test? As the judge, you’re allowed to ask any questions you like to try and distinguish the AI program from the human contestant, including novel and unexpected questions that the entrants have not had any chance to prepare for. At some point you will completely flummox the AI, but you will also flummox the human too. The interesting question then is whether you can tell the difference I.e. will the AI behave in a near-human way when trying to cope with a baffling and completely unexpected problem? If it does, that is a real sign if intelligence, is it not?
I mean move out of whatever format was agreed on. Move away from text-based systems (a truly smart AI could download voice software if it had to—make sure it has time to do so). Unilaterally extend the deadline. Offer side deals or bets with real money (which a smart AI could acquire or pretend to have). Insist the subject create videos on specific themes.
Do stuff you’re not supposed/expected to do.
There is a risk of asking more from the AI than a human could deliver.
Imagine a mute human, randomly taken from the street, who has to download voice software and use it to communicate with the judge, without making the judge suspect that they are the AI. How much change of success here? Similarly, how many people would lose bets? Etc.
On the other hand, if we prefer to err on the side of underestimating the AI, but want to avoid overestimating it, then the more difficult task, the better, even if some humans couldn’t solve it. But then… why not give the AI simply the task to convince humans about being intelligent, without any further rules?
Let’s contrast two situations:
1) We build a whole brain emulation, uploading from a particular brain. We subject that WBE to a Turing test, that it passes. Is it conscious? I’d argue yes, even without a definition on consciousness, we must still grant it to the WBE, if we grant it to humans.
2) Same thing, but instead of the WBE, we have a de novo computer system designed specifically to pass the Turing test via mass data crunching. I’d say we now need more proof that this system is conscious.
Why the difference? In the first case the WBE was optimised for being an accurate representation of a brain. So if it passes the Turing test, then it probably is an accurate representation, as it is hard to conceive of a very flawed brain representation that also passes that test.
In the second case, the system was optimised for passing the test only. So it is very possible to conceive it passing the test, but not having the other attributes of consciousness or intelligence. So our tests have to be more rigorous in the second case.
Not that I’ve got particularly good ideas how to do this! I just note that it needs to be done. Maybe “long” Turing tests (6 months or more) might be enough. Or maybe we’ll need to disconnect the AI from the internet (maybe give it a small video feed of some popular TV shows—but only give it info at human-bandwidth), wait for human society to evolve a bit, and test the AI on concepts that weren’t available when it was disconnected.
The form of the AI is also relevant—if it’s optimised for something else, then passing the Turing test is a much stronger indication.
What are these other attributes, as distinct from the attributes it would need to pass the Turing Test ?
Sure, you could ask it to make videos of itself skating or whatever, but a WBE wouldn’t be able to do that, either (seeing as it doesn’t have a body to skate with). Does it mean they both fail ?
I don’t think he meant it that way. I read it as “make a video montage of a meme” or the like. The point being that such a task exercises more elements of “human intelligence” than just chatting, like lexical and visual metaphors, perception of vision and movement, (at least a little bit of) imagination, planing and execution of a technical task, (presumably) using other software purposefully, etc. It is much harder to plan for and “fake” (whatever that means) all of that than to “fake” a text-only test with chat-bot techniques.
Of course, a blind (for instance) real man might not be able to do that particular task, but he will be able to justify that by being convincingly blind in the rest, and would be able to perform something analogous in other domains. (Music or even reciting something emphatically, or perhaps some tactile task that someone familiar with being blind might imagine.) The point I think is not to tie it to a particular sense or the body, but just to get a higher bandwidth channel for testing, one that would be so hard to fake in close to real time that you’d pretty much have to be smarter to do it.
Testing for consciousness seems to be so hard that text chat is not enough (or at least we’re close to being better at faking it than testing for it), so I guess Stuart suggests we take advantage of the “in-built optimizations” that let us do stuff like fake and detect accents or infer distances from differences in apparent height (but is some contexts, status or other things). Things that we don’t yet fake well, and even when we do, it’s hard to mix and integrate them all.
If you told me personally to do that, I may not pass the test, either. And I personally know several humans who cannot, f.ex., “use other software purposefully”. I think these kinds of challenges are a form of scope creep. We are not trying to test whether the AI is a polymath, just whether it’s human or not.
I disagree; that is, while I agree that participating in many types of interactions is more difficult than participating in a single type of interaction, I disagree that this degree of difficulty is important.
As I said before, in order to hold an engaging conversation with a human through “fakery”, the AI would have to “fake” human-level intelligence. Sure, it could try to steer the conversation toward its own area of expertise—but firstly, this is what real humans do as well, and secondly, it would still have to do so convincingly, knowing full well that its interlocutor may refuse to be steered. I simply don’t know of a way to perfectly “fake” this level of intelligence without actually being intelligent.
You speak of “higher bandwidth channels for testing”, but consider the fact that there are several humans in existence today, at this very moment, whose interaction with you consists entirely of text. Do you accept that they are, in fact, human ? If so, then what’s the difference between them and (hypothetical) Turing-grade AIs ?
I don’t believe it’s scope creep at all. The requirement isn’t really “make a video”. The requirement is “be able to do some of the things in the category ‘human activities that are hard to automate’”. Making a video is a specific item in the category, and the test is not to see that someone can do any specific item in the category, just that they can do some of them. If the human questioner gets told “I don’t know how to make a video”, he’s not going to say “okay, you’re a computer”, he’s going to ask “okay, then how about you do this instead?”, picking another item from the category.
(Note that the human is able to ask the subject to do another item from the category without the human questioner being able to list all the items in the category in advance.)
That is starting to sound like a “Turing Test of the gaps”.
“Chatting online is really hard to automate, let’s test for that.
Ok, we’ve automated chatting, let’s test for musical composition, instead.
Ok, looks like there are AIs that can do that. Let’s test it for calculus...”
My tests would be: have a chatterbot do calculus. Have a muscial bot chat. Have a calculus bot do music.
To test for general intelligence, you can’t test on the specific skill the bot’s trained in.
Try to teach the competitor to do some things that make sense to humans and some things that do no make sense to humans, from wildly different fields. If the competitor seems to be confused by things which are confusing to people and learns things which are not confusing, it is more likely to be thinking instead of parroting.
For example, you could explain why no consistent logical system can trust itself, and the ask the competitor if they think their way of thinking is consistent; if they think it isn’t, Ask them if they think that they could prove literally anything using their way of thinking. If they think it is, ask them if they would believe everything that they can prove to be true.
Thinking entities will tend to believe that they can’t prove things which are false, and thus that everything that they can prove is true. Calculating entities run I to trouble with those concepts.
Less meta, one could explain the magical thinking expressed in The Secret and ask why some people believe it and others don’t, along with asking why the competitor does or doesn’t.
I think we might have different definitions of what “general intelligence” is. I thought it meant something like, “being able to solve novel problems in some domain”; in this case, our domain is “human conversation”. I may be willing to extend the definition to say, ”...and also possessing the capacity to learn how to solve problems in some number of other domains”.
Your definition, though, seems to involve solving problems in any domain. I think this definition is too broad. No human is capable of doing everything; and most humans are only good at a small number of things. An average mathematician can’t compose music. An average musician can’t do calculus. Some musicians can learn calculus (given enough time and motivation), but others cannot. Some mathematicians can learn to paint; others cannot.
Perhaps you mean to say that humans are not generally intelligent, and neither are AIs who pass the Turing Test ? In this case, I might agree with you.
(Most) humans posses a certain level of general intelligence. Human groups, augmented by automation tools, and given enough time, possess a much more advanced general intelligence. The “no free lunch theorems” imply that it’s impossible to get a fully general intelligence in every environment, but we come pretty close.
I’ve somewhat refined my views of what would count as general intelligence in a machine; now I require mainly that it not be extremely stupid in any area that humans possess minimal competence at. Out-of-domain tests are implicit ways of testing for this, without doing the impossible task of testing the computer in every environment.
I’m not sure what that criticism is trying to say.
Assuming it’s an analogy with the god of the gaps, you might be saying that if the computer can pass the test the questioner can always pick a new requirement that he knows the computer can’t pass.
If this is what you are saying, then it’s wrong because of the flip side of the previous argument: just like the test doesn’t check to see if the subject succeeds in any specific item, it also doesn’t check to see if the subject fails in any specific item. In order for a computer to fail the test because of inability to do something, it has to show a pattern of inability that is different from the pattern of inability that a human would show. The questioner can’t just say “well, computers aren’t omniscient, so I know there’s something the computer will fail at”, pick that, and automatically fail the computer—you don’t fail because you failed one item.
Yep, this is it. I see no reason why we should hold computers to a higher standard than we do our fellow humans.
I’m not sure what kind of a “pattern of inability” an average human would show; I’m not even convinced that such a pattern exists in a non-trivial sense (f.ex., the average human surely cannot fly by will alone, but it would be silly to test for that).
All the tests that were proposed so far, such as “create a video” or “compose a piece of music” target a relatively small subset of humans who are capable of such tasks. Thus, we would in fact expect the average human to say, “sorry, I have no ear for music” or something of the sort—which is also exactly what an AI would say (unless it was actually capable of the task, of course). Many humans would attempt the task but fail; the AI could do that, too (by design or by accident). So, the new tests don’t really tell you much.
No computer is going to fail the Turing test because it can’t compose a piece of music. The questioner might ask it to do that, but if it replies “sorry, I have no ear for music” it doesn’t fail—the questioner then picks something else. If the computer can’t do that either, and if the questioner keeps picking such things, he may eventually get to the point where he says “Okay, I know there are some people who have no ear for music, but there aren’t many people who have no ear for music and can’t make a video and can’t paint a picture and.… He will then fail the computer because although it is plausible that a human can’t do each individual item, it’s not very plausible that the human can’t do anything in the list. No specific item is a requirement to be a human, and no specific inability marks the subject as not being human.
But if all the things in that conjunction are creative endeavors, why do you think a human not being able to do any of them is implausible? I have no ear for music, don’t have video-creation skills, can’t paint a picture, can’t write a poem, etc. There are many similar people, whose talents lie elsewhere, or perhaps who are just generally low on the scale of human talent.
If you judge such people to be computers, then your success rate as a judge in a Turing test will be unimpressive.
If the questioner is competent, he won’t pick a list where it’s plausible that some human can’t do anything on the list. If he does pick such a list, he’s performing the questioning incompetently. I think implicit in the idea of the test is that we have to assume some level of competency on the part of the questioner; there are many more ways an incompetent questioner could fail to detect humans other than just ask for a bad set of creative endeavors.
(I think the test also assumes most people are competent enough to administer the test, which also implies that the above scenario won’t happen. I think most people know that there are non-creative humans and won’t give a test that consists solely of asking for creative endeavors—the things they ask the subject to do will include both creative and non-creative but human-sepcific things.)
I think this entire thread is caused by, and demonstrates, the fact that we increasingly have no idea what the heck we’re even trying to measure or detect with the Turing test (is it consciousness? human-level intelligence? general intelligence? what?) …
… which is entirely unsurprising, since as I say in another comment on this post, the Turing test isn’t meant to measure or detect anything.
To use it as a measure of something or a detector of something is to miss the point. This thread, where we go back and forth arguing about criteria, pretty much demonstrates said fact.
I think the Turing Test clearly does measure something: it measures how closely an agent’s behavior resembles that of a human. The real argument is not, “what does the test measure ?”, but “is measuring behavior similarity enough for all intents and purposes, or do we need more ?”
If we prefer to be pedantic, we must go further than that: the test measures whether an agent can fool some particular interrogator into having a no-better-than-chance probability of correctly discerning whether said agent is a human (in the case where the agent in question is not, in fact, a human).
How well that particular factor correlates with actual behavioral similarity to a human (and how would we define and measure such similarity? along what dimensions? operationalized how?), is an open question. It might, it might not. It might take advantage of some particular biases of the interrogator (e.g. pareidolia, the tendency to anthropomorphize aspects of the inanimate world, etc.) to make him/her see behavioral similarity where little exists (cf. Eliza and other chatbots).
(Remember, also, that Turing thought that a meaningful milestone would be for a computer to “play the imitation game so well that an average interrogator will not have more than 70 percent chance of making the right identification after five minutes of questioning.” ! [Emphasis mine.])
I do partly agree with this:
And of course the question then becomes: just what are our intents and/or purposes here?
I think we’ve hit this milestone already, but we kind of cheated: in addition to just making computers smarter, we made human conversations dumber. Thus, if we wanted to stay true to Turing’s original criteria, we’d need to scale up our present-day requirements (say, to something like 80% chance over 60 minutes), in order to keep up with inflation.
I can propose one relatively straightforward criterion: “can this agent take the place of a human on our social network graph ?” By this I don’t simply mean, “can we friend it on Facebook”; that is, when I say “social network”, I mean “the overall fabric of our society”. This network includes relationships such as “friend”, “employee”, “voter”, “possessor of certain rights”, etc.
I think this is a pretty good criterion, and I also think that it could be evaluated in purely functional terms. We shouldn’t need to read an agent’s genetic/computer/quantum/whatever code in order to determine whether it can participate in our society; we can just give it the Turing Test, instead. In a way, we already do this with humans, all the time—only the test is administered continuously, and sometimes we get the answers wrong.
Agreed. Pretty much the only creative endeavour I’m capable of is writing computer code; and it’s not even entirely clear whether computer programming can be qualified as “creative” in the first place. I’m a human, though, not an AI. I guess you’d have to take my word for it.
My own practical version of the Turing test is “can we be friends?” (It used to be “can we fall in love?”) Once an AI passes a test like that I think the question of whether it’s “genuinely” conscious should be dissolved.
Actually, scratch that: either way, I think the question of whether something is “genuinely” conscious should be dissolved.
Mostly I agree with your last sentence.
I mostly think “is X conscious?” can be usefully replaced by a combination of “does X think?” and “does X experience pain/pleasure/etc.?” In both cases, answering the question is a matter of making inferences from incomplete knowledge, and it’s always possible to be wrong, but it’s also usually possible to be legitimately confident. If there’s anything else being asked, I don’t know what it is.
I don’t think that’s dissolving far enough. The questions those questions are stand-ins for, I think, are questions like “does X deserve legal consideration?” or “does X deserve moral consideration?” and we might as well be explicit about this.
I don’t think those questions are mere stand-ins. I think the answers to “does X deserve legal consideration?” or “does X deserve moral consideration?” depend heavily on “Is X conscious?” and “Does X experience pain/pleasure?” That is, if we answer “Is X conscious?” and “Does X experience pain/pleasure?” then we can answer “does X deserve legal consideration?” and “does X deserve moral consideration?”
If “Is X conscious?” and “Does X experience pain/pleasure?” simply stand-ins for “does X deserve legal consideration?” or “does X deserve moral consideration?”, then if we answered the latter two we’d stop caring about the former. I don’t think that’s so. There are still very interesting, very deep scientific questions to be answered about just what it means when we say something is conscious.
The problem is that I, for one, don’t know what the question “Is X conscious?” means and I’m not sure how to judge “Does X experience pain/pleasure?” in a non-biological context either. Nor has anyone else ever convinced me they know the answers to these questions. Still, it does seem as if neurobiology is making slow progress on these questions so they’re probably not intractable or meaningless. When all is said and done, they may not mean exactly what we vaguely feel they mean today; but I suspect that “conscious” will be more like the concept of “atom” than the concept of “ether”. I.e. we’ll recognize a clear connection between the original use of the word and the much more refined and detailed understanding we eventually come to. On the other hand, I could be wrong about that; and consciousness could turn out to be as useless a concept as ether or phlogiston.
Yeah, I waffled about this and ultimately decided not to say that, but I’m not confident.
I’m not really clear on whether what people are really asking is (e.g) “does X deserve moral consideration?,” or whether it just happens to be true that people believe (e.g.) that it’s immoral to cause pain, so if X can experience pain it’s immoral to cause X pain, and therefore X deserves moral consideration.
But I agree with you that if the former turns out to be true, then that’s the right question to be asking.
Admittedly, my primary reason for being reluctant to accept that is that I have no idea how to answer that question a priori, so I’d rather not ask it… which of course is more bias than evidence.
So how do you decide whether or not X deserve’s moral consideration based on something like long-term interactions, or looking at its code? I mean, if the real question is “how do I feel about X,” something something explicit.
Dunno. But I’d rather admit my ignorance about the right question.
Are you asking if it can consider you a friend or you can consider it a friend?
There has been a robot designed to love. Due to its simplistic nature, it was a crazy stalker, but nonetheless it can love. Emotion is easy. Intelligence is hard. Is your test just to see if it’s human enough that you feel comfortable calling whatever emotion it feels “love”?
Why should I attribute emotions to this contraption? Because there’s a number somewhere inside it that the programmer has suggestively called “love”? Because it interacts in ways which are about as similar to love as Eliza is to a conversational partner?
Because it acts in a manner that keeps it and the person of interest near each other.
Why should I attribute emotions to you?
So does a magnet. So does a homing missile. But a north pole does not love a south pole, and a missile does not love its target. Neither do rivers long to meet the sea, nor does fire long to ascend to heaven, nor do rocks desire the centre of the earth.
Because you experience them yourself, and I seem to be the same sort of thing as you are. Without any knowledge of what emotions are, that’s the best one can do.
This does not work for robots at the current state of the art.
True, but we can make robots better than that. The one I mentioned was capable of changing to be like that with the presence of a person. I don’t know much about that particular robot, but we can make ones that will change generally act in a manner that will put themselves in similar situation to the one they’re in at a given time, which is the best way I can define happiness, and we can make them happy when they’re near a specific person.
In any case, there is still a more basic problem. Why do you say that a magnet doesn’t love? I’m not saying that it does to any non-negligible extent, but it would be helpful to have a definition more precise than “do what humans do”.
Can you give an example of when it possible could work for robots? It sounds like you’re saying that it’s not love unless they’re conscious. While that is a necessary condition to make it an consciousness test, if that’s how you know it’s love than it’s circular. In order to prove it’s conscious it has to prove it can love. In order to prove it can love it must prove that it’s conscious.
No, because I don’t know what emotions are. I don’t believe anyone else does either. Neither does anyone know what consciousness is. Nobody even knows what an answer to the question would look like.
I seem to ascribe emotions to a system—more generally, I ascribe cognitive states, motives, and an internal mental life to a system—when its behavior is too complicated for me to account for with models that don’t include such things.
I can describe the behavior of a magnet without resorting to such things, so I don’t posit them.
That’s not to say that I’m correct to ascribe them to systems with complicated behavior… I might be; I might not be. Merely to say that it’s what I seem to do. It’s what other humans seem to do as well… hence the common tendency to ascribe emotions and personalities to all sorts of complex phenomena.
If I were somehow made smart enough to fully describe your behavior without recourse to what Dennett calls the intentional stance, I suspect I would start to experience your emotional behavior as “fake” somehow.
This isn’t quite a fully baked idea yet, but personlike agents are so ubiquitous in human modeling of complex systems that I suspect they’re a default of some kind—and that this doesn’t necessarily indicate a lack of deep understanding of a system’s behavior. Programmers often talk about software they’re working on in agent-like terms—the component remembers this, knows about that, has such-and-such a purpose in life—but this doesn’t correlate with imperfect understanding of the software; it’s just a convenient way of thinking about the problem. Likewise for people—I’m not a psychologist or a neuroscientist, but I doubt people in those professions think of their fellows’ emotions as less real for understanding them better than I do.
(The main alternative for complex systems modeling seems to be thinking of systems as an extension of the self or another agent, which seems to crop up mostly for systems tightly controlled by those agents. Cars are a good example—I don’t say “where is my car parked?”, I say “where am I parked?”.)
See also
You mean like a psudorandom number generator?
Motives are easy to model. You just set what the system optimizes for. The part that’s hard to model is creativity.
That’s a bad sign. My emotional behavior wouldn’t become fake due to your intelligence.
If I can consider it a friend. I also think “is whatever this robot experiencing genuinely love?” should be dissolved.
I think the OP, and many commenters here, might be missing the point of the Turing test (and I can’t help but suspect that the cause is not having read Turing’s original article describing the idea; if so, I highly recommend remedying that situation).
Turing was not trying to answer the question “is the computer conscious”, nor (the way he put it) “can machines think”. His goal was to replace that question.
Some representative quotes (from Turing’s “Computing Machinery and Intelligence”; note that “the imitation game” was Turing’s own term for what came to be called the “Turing test”):
Basically, treating the Turing test as if it was ever even intended to give an answer to questions like “does this AI possess subjective consciousness” is rather silly. That was not even close to the intent. If we want to figure out whether something is conscious, or what have you, we’ll have to find some other way. The Turing test just won’t cut it — nor was it ever meant to.
If the Turing test had been “can the computer win on Jeopardy?”, then we’d agree nowadays that substituting that for “can machines think?” would have been a poor substitution.
In Turing’s phrasing:
I’m questioning whether the Turing test is closely related to machine thinking, for machines calibrated to pass the Turing test.
For machines not calibrated to the test (eg whole brain emulations), I still think the two questions are closely related. Just as SAT scores are closely related to intelligence… for people who haven’t trained on SAT tests.
If “Can machines think?” is a meaningless question, then “Does the Turing test tell us whether machines can think?” must be equally meaningless. (That is, until we figure out what the heck we’re asking when we ask “Can this machine think?”, investigating how closely related the Turing test is to said issue is futile.)
Now, the following is my interpretation and not a paraphrasing of Turing. I think what Turing meant when he said that “can machines think” and “can a machine win the imitation game” are related questions is this: just like “can machines think” depends strongly on just what we mean by “think”, which depends on societal views and prevailing attitudes (see quote below[1]), so is our interpretation of the imitation game, and what an agent’s performance on it implies about that agent, closely related to societal attitudes and perception of agents as “thinking”, “conscious”, etc.
Turing thought that societal attitudes shaped what we think “thinking” is, and what kinds of things “think”. He also thought that success in the imitation game would herald a change in societal attitudes, such that people would think of machines as being able to “think”, and that this would render the philosophical discussion moot. At no point in this process do we ever define “thinking” or undertake any systematic way of determining whether an agent “thinks” or not.
Personally, I think he was right: at some point, past some threshold of apparent similarity of machines to humans, societal attitudes will shift and this whole business of administering tests to detect some magic spark will be rendered moot. Of course, that hasn’t happened yet, so here we are, still lacking anything resembling a proper definition of “thinking”. The Turing test does not help us in that regard.
[1] The promised quote:
I take your point, but Turing’s paper wasn’t simply an exercise in applied sociology. And the Turing test does help detect thinking, without having to define it: just consider applying it to a whole brain emulation. The Turing test and the definition of thinking are related; Truing was being disingenuous if he was pretending otherwise. He was actually proposing a definition of thinking, and stating that it would become the universally accepted one, the one that would be the “correct” simplification of the currently muddle concept.
Well, there’s this:
http://swarma.org/thesis/doc/jake_224.pdf
Certainly the Turing test can be viewed as an operationalization of “does this machine think?”. No argument there. I also agree with you concerning what Turing probably had in mind.
The problem is that if we have in mind (perhaps not even explicitly) some different definition of thinking or, gods forbid, some other property entirely, like “consciousness”, then the Turing test immediately stops being of much use.
Here is a related thing. John Searle, in his essay “Minds, Brains, and Programs” (where he presents the famous “Chinese room” thought experiment), claims that even if you a) place the execution of the “Chinese room” program into a robot body, which is then able to converse with you in Chinese, or b) simulate the entire brain of a native Chinese speaker neuron-by-neuron, and optionally put that into a robot body, you will still not have a system that possesses true understanding of Chinese.
Now, taken to its logical extreme, this is surely an absurd position to take in practice. We can imagine a scenario where Searle meets a man on the street, strikes up a conversation (perhaps in Chinese), and spends some time discoursing with the articulate stranger on various topics from analytic philosophy to dietary preferences, getting to know the man and being impressed with his depth of knowledge and originality of thought, until at some point, the stranger reaches up and presses a hidden button behind his ear, causing the top of his skull to pop open and reveal that he is in fact a robot with an electronic brain! Dun dun dun! He then hands Searle a booklet detailing his design specs and also containing the entirety of his brain’s source code (in very fine print), at which point Searle declares that the stranger’s half of the entire conversation up to that point has been nothing but the meaningless blatherings of a mindless machine, devoid entirely of any true understanding.
It seems fairly obvious to me that such entities would, like humans, be beneficiaries of what Turing called “the polite convention” that people do, in fact, think (which is what lets us not be troubled by the problem of other minds in day-to-day life). But if someone like John Searle were to insist that we nonetheless have no direct evidence for the proposition that the robots in question do “think”, I don’t see that we would have a good answer for him. (Searle’s insistence that we shouldn’t question whether humans can think is, of course, hypocritical, but that is not relevant here.) Social conventions to treat something as being true do not constitute a demonstration that said thing is actually true.
It is perhaps worth noting that Searle explicitly posits in that essay that the system is functioning as a Giant Lookup Table.
If faced with an actual GLUT Chinese Room… well, honestly, I’m more inclined to believe that I’m being spoofed than trust the evidence of my senses.
But leaving that aside, if faced with something I somehow am convinced is a GLUT Chinese Room, I have to rethink my whole notion of how complicated conversation actually is, and yeah, I would probably conclude that the entire conversation up to that point has been devoid entirely of any true understanding. (I would also have to rethink my grounds for believing that humans have true understanding.)
I don’t expect that to happen, though.
Actually, Searle’s description of the thought experiment does include a “program”, a set of rules for manipulating the Chinese symbols provided to the room’s occupant. Searle also addresses a version of the contrary position (the pro-AI position, as it were) that posits a simulation of an actual brain (to which I alluded in the grandparent). He doesn’t think that would possess true understanding, either.
I think that if we’ve gotten to the point where we’re rethinking whether humans have true understanding, we should instead admit that we haven’t the first clue what “true understanding” is or what relation, if any, said mysterious property has to do with whatever we’re detecting in our test subjects.
Oh, and: GAZP vs. GLUT.
Wouldn’t such a GLUT by necessity require someone possessing immensely fine understanding of Chinese and English both, though? You could then say that the person+GLUT system as a whole understands Chinese, as it combines both the person’s symbol-manipulation capabilities and the actual understanding represented by the GLUT.
You might still not possess understanding of Chinese, but that does not mean a meaningful conversation has not taken place.
I have no idea whether a GLUT-based Chinese Room would require someone possessing immensely fine understanding of Chinese and English both. As far as I can tell, a GLUT-based Chinese Room is impossible, and asking what is or isn’t required to bring about an impossible situation seems a silly question. Conversely, if it turns out that a GLUT-based Chinese Room is not impossible, I don’t trust my intuitions about what is or isn’t required to construct one.
I have no problem with saying a Chinese-speaking-person+GLUT system as a whole understands Chinese, in much the same sense that I have no problem saying that a Chinese-speaking-person+tuna-fish-sandwich system as a whole understands Chinese. I’m not sure how interesting that is.
I’m perfectly content to posit an artificial system capable of understanding Chinese and having a meaningful conversation. I’m unable to conceive specifically of a GLUT that can do so.
I don’t think it’s that hard to conceive of. Imagine that the Simulation Argument is true; then, we could easily imagine a GLUT that exists outside of our own simulation, using additional resources; then our Chinese Room could just be an interface for such a GLUT.
As you said though, I don’t find the proposal very interesting, especially since I’m not a big fan of the Simulation Argument anyway.
I find I am unable, on brief consideration, to conceive of a GLUT sitting in some real world within which my observable universe is being computed… I have no sense of what such a thing might be like, or what its existence implies about the real world and how it differs from my observed simulation, or really much of anything interesting.
It’s possible that I might be able to if I thought about it for noticeably longer than I’m inclined to.
If you can do so easily, good for you.
Not necessarily. Theoretically, one could have very specific knowledge of Chinese, possibly acquired from very limited but deep experience. Imagine one person who has spoken Chinese only at the harbor, and has complete and total mastery of the maritime vocabulary of Chinese but would lack all but the simplest verbs relevant to the conversations happening just a mile further inland. Conceivably, a series of experts in a very localized domain could separately contribute their understanding, perhaps governed by a person who understands (in English) every conceivable key to the GLUT, but does not understand the values which must be placed in it.
Then, imagine someone whose entire knowledge of Chinese is the translation of the phrase: “Does my reply make sense in the context of this conversation?” This person takes an arbitrary amount of time, randomly combining phonemes and carrying out every conceivable conversation with an unlimited supply of Chinese speakers. (This is substantially more realistic if there are many people working in a field with fewer potential combinations than language). Through perhaps the least efficient trial and error possible, they learn to carry on a conversation by rote, keeping only those conversational threads which, through pure chance, make sense throughout the entire dialogue.
In neither of these human experts do we find a real understanding of Chinese. It could be said that the understandings of the domain experts combine to form one great understanding, but the inefficient trial-and-error GLUT manufacturers certainly do not have any understanding, merely memory.
I agree on the basic point, but then my deeper point was that somewhere down the line you’ll find the intelligence(s) that created a high-fidelity converter for an arbitrary amount of information from one format to another. Sarle is free to claim that the system does not understand Chinese, but its very function could only have been imparted by parties who collectively speak Chinese very well, making the room at very least a medium of communication utilizing this understanding.
And this is before we mention the entirely plausible claim that the room-person system as a whole understands Chinese, even though neither of its two parts does. Any system you’ll take apart to sufficient degrees will stop displaying the properties of the whole, so having us peer inside an electronic brain asking “but where does the intelligence/understanding reside?” misses the point entirely.
This does not pass the simplest plausibility test. Do you imagine that being at a harbor causes people to have only conversations which are uniquely applicable to harbor activities? Does one not need words and phrases for concepts like “person”, “weather”, “hello”, “food”, “where”, “friend”, “tomorrow”, “city”, “want”, etc., not to mention rules of Chinese grammar and syntax? Such a “harbor-only” Chinese speaker may lack certain specific vocabulary, but he certainly will not lack a general understanding of Chinese.
Your other example is even sillier, especially given that the number of possible conversations in a human language is infinite. For one thing, a conversation where one person is constantly asking “Does my reply make sense?” is very, very different from the “same” conversation without such constant verbal monitoring. (Not to mention the specific fact that your imaginary expert would not be able to understand his interlocutor’s response to his question about whether his utterances made sense.)
You make some valid points.
A more realistic version would be for for an observer to record all conversations between two Chinese speakers with length N, where N is some arbitrarily large but still finite conversation length. (If a GLUT were to capture every possible conversation, you are correct in saying that it would have to be infinite).
From a sufficiently large sample size (though it is implausible to capture every probable conversation in any realistic amount of time, not to mention in any amount of time during which the language is relatively stable and unchanging), a tree of conversations could be built, with an arbitrarily large probability of including a given conversation within it.
From this, one could built a GLUT (though it would probably be more efficient as a tree) of the possible questions given context and the appropriate responses. Though it would be utterly unfeasible to build, that is a limitation of the availability of data, rather than the GLUT structure itself. It would not be perfect—one cannot build an infinite GLUT, nor can one acquire the infinite amount of data with which to fill it—but it could, perhaps, surpass even a native speaker by some measures.
I remain dubious.
Consider: what would the table contain as appropriate responses for the following questions? (Each question would certainly appear many, many times in our record of all conversations up to length N.)
“Hello, what is your name?”
“Where do you live?”
“What do you look like?”
“Tell me about your favorite television show.”
Remember that a GLUT, by definition, matches each input to one output. If you have to algorithmically consider context, whether environmental (what year is it? where are we?), personal (who am I?), or conversation history (what’s been said up to this point?), then that is not a GLUT, it is a program. You can of course convert any program that deterministically gives output for given input into a GLUT, but to do that successfully, you really do need all possible inputs and their outputs; and “input” here means “question, plus conversation history, plus complete description of world-state” (complete because we don’t know what context we’ll need in order to give an appropriate response).
In other words, to construct such a GLUT, you would have to be well-nigh omniscient. But, admittedly, you would not then have to “know” any Chinese.
I wouldn’t; or, at least, not necessarily.
Just for simplicity, let’s say we the agent and the experimenter are conversing via text-based chat. The agent and the experimenter take turns outputting a single line of text (it could be a very long line); the experimenter always goes first, saying “hello”.
In this case, the agent’s side of the conversation can be modeled as the function F(H(t), t) where t is the current line number, and H(t) is the sequence of inputs that the agent received up to point t. Thus, H(0) is always {”hello”}, as per above. H(2) might be something like {”hello”, “your name sounds funny”}, etc.
We know that, since F is a function, it is a relation that maps each possible input to a single output—so it’s basically a lookup table. In the ideal case, the number of possible inputs is infinite (trivially, we could say “hello”, “helloo”, “hellooo”, and so on, and infinitum), and thus the lookup table would need to be infinitely large. However, we humans are finite creatures, and thus in practice the lookup table would only need to be finitely large.
Of course, practically speaking, you’d probably still need a storage device larger than the Universe to encode even a finite lookup table of sufficient size; but this is a practical objection, which does not a priori prohibit us from implementing a Turing-grade agent as a GLUT.
You’re correct that it doesn’t a priori prohibit such a thing. It does, however, bring my prior probability of encountering such a thing vanishingly low. Faced with an event that somehow causes me to update that vanishingly small probability to the point of convincing me it’s true I am vastly surprised, and that vast surprise colors all of my intuitions about interactions with nominally intelligent systems. Given that, it’s not clear to me why I should keep believing that I was having an intelligent conversation a moment earlier.
What do you mean by “intelligent conversation” ? Do you mean, “a conversation with an intelligent agent”, or “a conversation whose contents satisfy certain criteria”, and if so, which ones ? I’ll assume you mean the former for now.
Let’s say that you had a text-only chat with the agent, and found it intellectually stimulating. You thought that the agent was responding quite cleverly to your comments, had a distinct “writer’s voice”, etc.
Now, let’s imagine two separate worlds. In world A, you learned that the agent was in fact a GLUT. Surprised and confused, you confronted it in conversation, and it responded to your comments as it did before, with apparent intelligence and wit. But, of course, now you knew better than to fall for such obvious attempts to fool you.
In world B, the exact same thing happened, and the rest of your conversation proceeded as before, with one minor difference: unbeknownst to you, the person who told you that the agent was a GLUT was himself a troll. He totally gaslighted you. The agent isn’t a GLUT or even an AI; it’s just a regular human (and the troll’s accomplice), a la the good old Mechanical Turk.
It sounds to me like if you were in world B, you’d still disbelieve that you were having a conversation with an intelligent agent. But in world B, you’d be wrong. In world A, you would of course be right.
Is there any way for you to tell which world you’re in (I mean, without waterboarding that pesky troll or taking apart the Chinese Room to see who’s inside, etc.) ? If there is no way for you to tell the difference, then what’s the difference ?
By the way, I do agree with you that, in our real world, the probability of such a GLUT existing is pretty much zero. I am merely questioning the direction of your (hypothetical) belief update.
By construction, there’s no way for me to tell… that is, I’ve already posited that some event (somehow, implausibly) convinced me my interlocutor is a GLUT.
In world A, “I” was correct to be convinced; my interlocutor really was (somehow, implausibly) a GLUT, impossible as that seems. In world B, “I” was (somehow, implausibly) incorrectly convinced.
There’s all kinds of things I can be fooled about, and knowing that I can be fooled about those things should (and does) make me more difficult to convince of them. But if, even taking that increased skeptcism into account, I’m convinced anyway… well, what more is there to say? At that point I’ve been (somehow, implausibly) convinced, and should behave accordingly.
To say “Even if I’ve (somehow, implausibly) been exposed to a convincing event, I don’t update my beliefs” is simply another way of saying that no such convincing event can exist—of fighting the hypothetical.
Mind you, I agree that no such convincing event can exist, and that the hypothetical simply is not going to happen. But that’s precisely my point: if it does anyway, then I am clearly deeply confused about how the universe works; I should at that point sharply lower my confidence in all judgments even vaguely related to the nonexistence of GLUT Chinese Rooms, including “I can tell whether I’m talking to an intelligent system just by talking to them”.
The extent of my confidence in X ought to be proportional to the extent of my confusion if I come (somehow) to believe that X is false.
I think I see what you’re saying—discovering that something as unlikely as a GLUT actually exists would shake your beliefs in pretty much everything, including the Turing Test. This position makes sense, but I think it’s somewhat orthogonal to the current topic. Presumably, you’d feel the same way if you became convinced that gods exist, or that Pi has a finite number of digits after all, or something.
Not quite.
Discovering that something as unlikely as a conversation-having GLUT exists would shake my beliefs in everything related to conversation-having GLUTs. My confidence that I’m wearing socks right now would not decrease much, but my confidence that I can usefully infer attributes of a system by conversing with it would decrease enormously. Since Turing Tests are directly about the latter, my confidence about Turing Tests would also decrease enormously.
More generally, any event that causes me to sharply alter my confidence in a proposition P will also tend to alter my confidence in other propositions related to P, to an extent proportional to their relation.
An event which made me confident that pi was a terminating decimal after all, or that some religion’s account of its god(s) was accurate, etc. probably would not reduce my confidence in the Turing Test nearly as much, though it would reduce my confidence in other things more.
Why not ? Encountering a bona-fide GLUT that could pass the Turing test would be tantamount to a miracle. I personally would begin questioning everything if something like that were to happen. After all, socks are objects that I had previously thought of as “physical”, but the GLUT would shake the very notions of what a “physical” object even is.
Why that, and not your confidence about GLUTs ?
Of course my confidence about GLUTs would also decrease enormously in this scenario… sorry if that wasn’t clear.
More generally, my point here is that a conversation-having GLUT would not alter my confidence in all propositions equally, but rather would alter my confidence in propositions to a degree proportional to their relation to conversation-having GLUTs, and “I can usefully infer attributes of a system by conversing with it” (P1) is far more closely related to conversation-having GLUTs than “I’m wearing socks” (P2).
If your point is that my confidence in P2 should nevertheless be significant, even if much less than P1… well, maybe. Offhand, I’m not sure my brain is capable of spanning a broad enough span of orders-of-magnitude of confidence-shift to be able to consistently represent the updates of both P1 and P2, but I’m not confident either way.
I agree with you concerning Searle’s errors (see my takes on Searle at http://lesswrong.com/lw/ghj/searles_cobol_room/ http://lesswrong.com/lw/gyx/ai_prediction_case_study_3_searles_chinese_room/ )
I think the differences between us are rather small, in fact. I do have a different definition of thinking, which is not fully explicit. It would go along the lines of “a thinking machine should demonstrate human-like abilities in most situations and not be extremely stupid in some areas”. The intuition is that if there is a general intelligence, rather than simply a list of specific rules, then it’s competence shouldn’t completely collapse when facing unusual situations.
The “test systems on situations they’re not optimised” approach was trying to establish whether there would be such a collapse in skill. Of course you can’t test for every situation, but you can get a good idea this way.
This seems like a slightly uncharitable reading of Searle’s position.
Searle’s steadfast refusal to consider perfectly reasonable replies to his position, and his general recalcitrance in the debate on this and related questions, makes him unusually vulnerable to slightly uncharitable readings. The fact that his justification seems to be “human brains have unspecified magic that make humans conscious, and no I will not budge from that position because I have very strong intuitions” means, I think, that my reading is not even very uncharitable.
Oh, and on the subject of whole brain emulations: Greg Egan’s recent novel Zendegi (despite being, imo, rather poor overall), does make a somewhat convincing case that an emulation of a person’s brain/consciousness/personality might pass something like a Turing test and still not possess subjective consciousness or true general intelligence on a human level.
When this topic comes up I’m always reminded of a bit in John Varley’s Golden Globe, where our hero asks an advanced AI whether it’s actually conscious, and it replies “I’ve thought about that question a lot and have concluded that I’m probably not.”
Of course, “intelligence” here is being measured with an IQ test, which I’m guessing also loses it’s predictive power if you train at it.
As far as I understand, the Turing Test renders questions such as “does X really possess subjective consciousness or is it just pretending” simply irrelevant. Yes, applying the Turing Test in order to find answers to such questions would be silly; but mainly because the questions themselves are silly.
Well, right; hence Turing’s dismissal of “do machines really X” as “too meaningless to deserve discussion”. If we insist on trying to get an answer to such a question, “Turing-test it harder!” is not the way to go. We should, at the very least, figure out what the heck we’re even asking, before trying to shoehorn the Turing test into answering it.
Why do you think (if you agree with Turing) that the question of whether machines think is too meaningless to diserve discussion?...if that question isn’t a bit paradoxical?
I do agree with Turing on this one. What matters is how an agent acts, not what powers its actions.For example, what if I told you that in reality, I don’t speak a word of English ? Whom are you going to believe—me, or your lying eyes ?
I’m willing to go even farther out on a limb here, and claim that all the serious objections that I’ve seen so far are either incoherent, or presuppose some form of dualism—which is likewise incoherent. They all boil down to saying, “No matter how closely a machine resembles a human, it will never be truly human, because true humans have souls/qualia/consciousness/etc. We have no good way of ever detecting these things or even fully defining what they are, but come on, whom are you gonna believe ? Me, or your lying eyes ?”
http://www.youtube.com/watch?v=dd0tTl0nxU0
I remember hearing the story of a mathematical paper published in English but written by a Frenchmen, containing the footnotes:
1 I am grateful to professor Littlewood for helping me translate this paper into English.2
2 I am grateful to professor Littlewood for helping me translate this footnote into English.3
3 I am grateful to professor Littlewood for helping me translate this footnote into English.
Why was no fourth footnote necessary?
So… the answer is… if I told you I don’t speak any English, you’d believe me ? Not sure what your point is here.
Well, I posted the link mostly as a joke, but we can take a serious lesson from it: yes, maybe I would believe you; it would depend. If you told me “I don’t speak English”, but then showed no sign of understanding any questioning in English, nor ever showed any further ability to speak it… then… yeah, I’d lean in the direction of believing you.
Of course if you tell me “I don’t speak English” in the middle of an in-depth philosophical discussion, carried on in English, then no.
But a sufficiently carefully constructed agent could memorize a whole lot of sentences. Anyway, this is getting into GAZP vs. GLUT territory, and that’s being covered elsewhere in the thread.
There are already quite a few comments on this post—do you have a link to the thread in question ?
http://lesswrong.com/lw/hgl/the_flawed_turing_test_language_understanding_and/90rl
What’s the reasoning here? This is the sort of thing that seems plausible in many cases but the generality of the claim sets off alarm bells. Is it really true that we never care about the source of a behavior over and above the issue of, say, predicting that behavior?
Well, this isn’t quite the issue. No one is objecting to the claim that machines can be people (as, I think Dennet aptly said, this would be surprising given that people are machines). Indeed, its out of our deep interest in that possibility that we made this mistake about Turing tests: I for one would like to be forgiven for being blind to the fact that all the Turing test can tell us is whether or not a certain property (defined entirely in terms of the test) holds of a certain system. I had no antecedent interest in that property, after all. What I wanted to know is ‘is this machine a person’, eagerly/fearfully awaiting the day that the answer is ‘yes!’.
You may be right that my question ‘is this machine a person’ is incoherent in some way. But it’s surprising that the Turing test involves such a serious philosophical claim.
I wasn’t intending to make a claim quite that broad; but now that you mention it, I am going to answer “yes”—because in the process of attempting to predict the behavior, we will inevitably end up building some model of the agent. This is no different from predicting the behaviors of, say, rocks.
If I see an object whose behavior is entirely consistent with that of a roughly round rock massing about 1kg, I’m going to go ahead and assume that it’s a round-ish 1kg rock. In reality, this particular rock may be an alien spaceships in disguise, or in fact all rocks could be alien spaceships in disguise, but I’m not going to jump to that conclusion until I have some damn good reasons to do so.
My point is not that the Turing Test is a serious high-caliber philosophical tool, but rather that the question “is this agent a person” is a lot simpler than philosophers make it out to be.
I do agree with Turing, but I’m reluctant to indulge this digression in the current comment thread. My point was that regardless of whether we think that “Can machines think?” is meaningless, Turing certainly thought so, and he did not invent his test with the purpose of answering said question. When we attempt to use the Turing test to determine whether machines think, or are conscious, or any such thing, we’re a) ignoring the design intent of the test, and b) using the wrong tool for the job. The Turing test is unlikely to be of any great help in answering such questions.
Consider that point amply made, thanks.
Humans seem to have something similar, e.g. humans who are unable to form mental imagery can talk normally but perform badly on any task which is difficult to do without mental imagery. And in programming there are people who seem intelligent on basis of how they talk about programming but are surprisingly bad at constructing anything that works for a given, fixed task.
Actually people without mental imagery do well on tasks that involve mental imagery: http://discovermagazine.com/2010/mar/23-the-brain-look-deep-into-minds-eye
I don’t think the “over-fitting” problem applies to the Turing Test: you can ask the candidate about anything, and adapt your later questions accordingly. There are proofs in computational complexity (that I’m too lazy to look up right now) that show that you can’t pass this kind of test (except with exponentially small probability) but by containing a polynomial-time algorithm for the entire problem space. (It’s related to the question of what problems are IP-complete—i.e. the hardest among those problems that can be quickly solved via interactive proof.)
It would only be analogous to the test of the students if you published a short list of acceptable topics for the TT and limited the questions to that. Which they don’t do.
Edit: If you were right, it would be much easier to construct such a “conversation savant” than it has proven to be.
Watson shocked me—I didn’t think that type of performance was possible without AI completeness. That was a type of savant that I thought couldn’t happen before AGI.
It might be that passing for a standard human in a Turing test is actually impossible without AGI—I’m just saying that I would want more proof in the optimised-for-Turing-test situation than in others.
This interests me (as someone professionally involved in the creation of savants, though not linguistic ones). Can you articulate why you thought that?
It wasn’t formalised thinking. I bought into the idea of AI-complete problems, ie that there were certain problems that only a true AI could solve—and that if it could, it could also solve all others. I was also informally thinking that linguistic ability was the queen of all human skills (influenced by the Turing test itself and by the continuous failure of chatterbots). Finally, I wasn’t cognisant of the possibilities of Big Data to solve these narrow problems by (clever) brute force. So I had the image of a true AI being defined by the ability to demonstrate human-like ability on linguistic problems.
The game Watson was playing was non-interactive[1] -- that is, unlike with the TT, you could not change the later Jeopardy questions, based on Watson’s answers, in an attempt to make it fail.
Had they done so, that would have forced an exponential blowup in the (already large) amount it would have to learn to get the same rate of correct answers.
(Not that humans would have done better in that case, of course!)
Interactivity makes a huge difference because you can focus away from its strong points and onto its weak points, thus forcing all points to be strong in order to pass.
[1] “non-adaptive” may be a more appropriate term in this context, but I say “interactive” because of the relevance of theorems about the IP complexity class (and PSPACE, which is equal).
I’m not saying that Watson could pass, or almost pass a Turing test. I’m saying that Watson demonstrated a combination of great quasi-linguistic skill and great general incompetence that I wasn’t expecting to be possible. It proved that a computer could be “taught to the test” in at least some areas.
So I think we should keep open the possibility that a computer could be taught to the Turing test as well.
Well, yes, if you make the test non-adaptive, it’s (exponentially) easier to pass. For example, if you limit the “conversation” to a game of chess, it’s already possible. But those aren’t the “full” Turing Test; they’re domain-specific variants. Your criticism would only apply to the latter.
Are AI players actually indistinguishable from humans in Chess? Could an interrogator not pick out consistent stylistic differences between equally-ranked human and AI players?
Actually, don’t they currently limit conversations to a preselected topic? And still the chatbots fail.
I’m not really sure what you’re driving at here. We don’t have any software even close to being able to pass the TT right now; at the moment, using relatively easy subsets of the TT is the most useful thing to do. That doesn’t mean that anyone expects that passing such a subset counts as passing the general TT.
I was just noting that current “Turing Tests” are exactly what was being used as an example of something-that-is-not-a-Turing-test. It’s mildly ironic, that’s all.
Incidentally, humans have been known to fail Turing tests once in a while, being mistaken for computers instead.
Also, there has been one ELIZA variation that’s managed to successfully come across as human to people who were aware that they might fool people: it imitates a paranoid schizophrenic who believes the Mafia is out to get him, and if asked about something else, it insists on talking about the Mafia instead. Apparently, psychiatrists couldn’t tell it wasn’t a real patient… which probably says more about people with severe mental illness than it does about artificial intelligence.
Sure. And there’s a vast number of artificial systems out there that can successfully emulate catatonic humans over a text interface, or even perfectly healthy sleeping humans. As you say, none of that says much about AI.
Guessing the teacher’s password describes a common human behavior. An AI that behaves the same way not just passes the Turing test, it might really be said to be as intelligent as a relatively stupid human.
Also, there’s a huge quantitative difference between a student repeating what the teacher said, and an AI repeating at will everything written in all digitized books and Internet sites it has read. An AI that is a little less intelligent than a human in some areas will still be vastly more intelligent than any human in most other areas. Even if humans remain much better at certain tasks, it may not truly matter.
Which means that at equivalent performance, the student has more skill, but less data. The question is whether truly general intelligence can be achieved or approximated via mass data. So far, data-based achievements have been less generalisable that might have been supposed (eg Watson).
The concept of the Turing test fails to impress me in both directions (I’d guess an abundance of both false positives and false negatives)
If penguins had to determine whether humans have reached penguin-level intelligence, being able to mimick a penguin’s mating-call would be just the sort of test that penguins would devise. But it’s not a proper test of intelligence, it’s a test of penguin-mimickry by creatures so simplistic (or simplex in the terminology of Samuel R. Delany) as to think that “Intelligence” means “Acting Much Like A Penguin Would”.
The difference here is that humans are generally intelligent, whereas penguins are not. Thus, you could imitate a penguin without possessing general intelligence, but that won’t be enough to imitate a human.
This is probably optimistic. There might be large areas of thought which are within our theoretical capacity that are still more or less blank spots for us.
We’re probably still the nearest thing to generally intelligent on the planet.
Hmm, after reading the post again I have downvoted it as too weak for Main. No new concepts, no interesting questions, just some musings. Not even a clear definition of what “conscious” means. The idea that “a secret Turing test is better than an overt one” is fine for Discussion, maybe, or for an open thread.
A definition of conscious is a high bar to cross! :-) One minor point is precisely that we don’t know what the Turing test is measuring—it’s measuring something related to intelligence and consciousness, possibly, but what exactly isn’t clear.
I think the more relevant points are the flaw in the Turing test (what should we expect after the headlines “AI passes the Turing test”?), and the possibility of quasi-p-zombies.
I disagree, but will take your judgement into account.
This must be determined empirically. Anyone have access to a reputable news source?
Actually, scratch that, let’s just get the Onion to write it up.
If we don’t know what “intelligence” and “consciousness” are anyway, then it’s a distinction without a difference.
Just because we don’t know what something is beyond a few vague verbal statements doesn’t mean we can’t know a few things it pretty definitely isn’t. See: most of human history.
I am compelled to link to this amusing short science fiction story.
Looking at the sidebar, I was certain that this comment was a link to this xkcd.
I had completely forgotten about that one.
The classical problem is that the Turing Test is behavioristic and only provides a sufficient criterion (rather than replacing talk about ‘intelligence’ as Turing suggests). And it doesn’t provide a proper criterion in that it relies on human judges—who tend to take humans for computers, in practice. - Of course it is meant to be open-ended in that “anything one can talk about” is permitted, including stuff that’s not on the web. That is a large set of intelligent behavior, but a limited set—so the “design to the test” you are pointing out is precisely what chatterbot people use. And it’s usually pretty dumb and provides no insight into human flexibility in using language (which can be used for more than “talking about stuff”). I also suspect that we’ll have passing the test pretty soon, in the sense of non-sophisticated judges. So far, results are hopeless, however! The main weakness is that the machines don’t do much analysis of the conversation so far. --- Essentially, we know from similar problems (e.g. speech recognition) that one can get very good, but somewhere in the upper 90% there is a limit that’s very hard to break without using more data.
The 90% argument is very pertinent, and may be the thing that preserves the Turing test as general intelligence test.
Or maybe not! We’ll have to see...
I’m not entirely sure what you mean by “Turing Test”. As far as I understand, the test is not multiple choice; instead, you just converse with the test subject as best you can, then make your judgement. And, since the only judgements you can make are “human” and “non-human”, the test doesn’t tell you how well the test subject can solve urban navigation problems or whatever; all it tells you is how good the subject is at being human.
The trick, though, is that in order to converse on a human level, the test subject would have to implement at least some form of AGI, because this is what humans do. This does not mean that the AI would be able to actually solve any problem in front of it, but that’s ok, because neither can humans. The Turing test is designed to identify human-level AIs, not Singularity-grade quasi-godlike uber-minds.
You dismiss “chattering” as some sort of a merely “linguistic” trick, but that’s just an implementation detail. Who cares whether the AI runs on biological wetware or a “narrow statistical machine” ? If I can hold an engaging conversation with it, I’m going to keep talking until I get tired, statistics or no statistics. I get to have interesting conversations rarely enough as it is...
That is the premise I’m questioning here. I’m not currently convinced that a super chatterbot needs to demonstrate general intelligence.
I understand what you’re saying, but I don’t understand why. I can come up with several different interpretations of your statement:
Regular humans do not need to utilize their general intelligence in order to chat, and thus neither does the AI.
It’s possible for a chatterbot to appear generally intelligent without actually being generally intelligent.
You and I are talking about radically different things when we say “general intelligence”.
You and I are talking about radically different things when we say “chatting”.
To shed light on these points, here are some questions
Do you believe that a non-AGI chatterbot would be able to engage in a conversation with you that is very similar to the one you and I are having now ?
Admittedly, I am not all that intelligent and thus not a good test case. Do you believe that a non-AGI chatterbot could be built to emulate you personally, to the point where strangers talking with it on Less Wrong could not tell the difference between it and you ?
That is what I’m arguing may well be the case.
Ok, that gives me one reference point, let me see if I can narrow it down further:
Do you believe that humans are generally intelligent ? Do you believe that humans use their general intelligence in order to hold conversations, as we are doing now ?
Edit: “as we are doing now” above refers solely to “hold conversations”.
Actually, this seems surprisingly plausible, thinking about it. A lot of conversations are on something like autopilot.
But eventually even a human will need to think in order to continue.
Did you know that we already have instances of things that pass the Turing test?
And more surprisingly, that we don’t generally consider them conscious?
And the most amazing of all: That they have existed for probably at the very least a hundred thousand years (but possibly much more)?
I am talking about the characters in our dreams
They fool us into thinking that they are conscious! That they are the subjects of their own worlds just as people presumably are when awake.
You can have a very eloquent conversation with a dream character without ever noticing there is any apparent lack of consciousness. You can even ask them about their own consciousness (I have done so).
The riddle to why this is possible involves a very deep state of affairs that we are scarcely aware of in daily life. Namely, that your phenomenal self is, just as well, a dream character.
They borrow our machinery for consciousness—it’s not clear to me that they aren’t.
Also, it’s rare for a dream to be so coherent that a transcript would convince an unimpaired (conscious) human.
Here is a simple algorithm that passes Turing test:
Using [EDIT: random] quantum events, generate random bits and send them to output.
In some Everett branches this algorithm passes the test.
(Somewhat related: If Mr. Searle shows me a “giant lookup table” which passes the Turing test and asks me whether it is intelligent, my response will be: “Stop playing silly games and show me the algorithm that created the lookup table.”)
(But mostly doesn’t, with confidence much more reliable than the expected variability of any given Turing tester.)
Incidentally the same algorithm (combined with some kind of synthesising device) can also create an actual living creature that we would expect (and desire) to pass such a test. This is just (even more) ridiculously unlikely (and/or occurs in less descendant Everett branches depending on nomenclature.)
I assume (and imply that it would be better) that you intent the ‘random bits’ part to be the important point more so than the ‘quantum’ part? Considering random ‘passing’ output reminds us that there are certain limits to the strength of the conclusions we can draw from such a test. The ‘quantum’ part just prevents distracting side-tracks like passing the buck to the pseudo-random number generator. ie. I would consider your point to be almost as strong even if the universe went around collapsing away those ‘Everett branches’ and you had to speak of “sometimes” instead of “In some Everett branches”.
You are right. My point was that there are two ways how a “giant lookup table” could pass a Turing test.
a) It could be constructed by an enormous superhuman intelligence, in which case stop speaking about the table and show me the intelligence that created it.
b) It just got lucky… which proves nothing, because if you are lucky enough, you can pass the Turing test without the lookup table, just by sending random bits to the output.
In this case the game of ‘find the intelligence’ traces back through the random algorithm and to the person who selected the overwhelmingly improbable random outcome out of the set of possible random outcomes. That is, the algorithm that has produced apparently conscious output and can be said to result in a pass in the Turing Test is any algorithm that can take the string “imagine that you have a random bitstring that happens to look like it is conscious” and create imaginary bitstrings that instantiate that.
It ‘proves nothing’ in the same way that all of science has proved nothing. “If we are lucky enough” every experimental test we have done to conclude that gravity exists could have resulted from a physics where mass is constantly accelerated in random directions. If so, let’s hope that our luck keeps holding...
Demonstrating that something is overwhelmingly likely is, indeed, a different thing than proving that something has probability zero. But it is still rather useful information.
I’m not really sure what you mean by “conscious from a linguistic perspective,” and judging from your responses to other comments I infer that neither are you.
So let me try some simpler questions: is Watson conscious from a Jeapordy-game-playing perspective? Is a system that can perfectly navigate a maze conscious from a maze-navigation perspective? Is an adding machine conscious from an adding-numbers-together perspective?
If you answer “no” to any of those, then I don’t know what you mean well enough to proceed, but it might help if you explain why not.
If your answers are “yes” to all of those, then I guess I agree that it’s logically possible for a system to be (in your terms) conscious from a linguistic perspective without necessarily being something I would consider near-human intelligence, but I’m very skeptical of our ability to actually build such a system, and I’m not too worried about the prospect.
By “conscious from a linguistic perspective”, I mean that we cannot distinguish it from a conscious being purely through linguistic interactions (ie Turing tests). I probably should have said “conscious from a linguistic (text-based) perspective” to be more precise.
OK, fair enough. The second response applies.
Said differently, I expect that the set of cognitive activities required to support linguistic behavior that we can’t distinguish from, say, my own (supposing here that I’m a conscious being) in a sufficiently broad range of linguistic interactions correlates highly enough with any other measure of “this is a conscious being” I might care to use that any decision procedure that files a system capable of such activities/behavior as “nonconscious” will also file conscious beings that way.
I wrote an idea for an alternate test, cloze deletion tests. The idea is that instead of imitating a human, you predict what a human would say. In theory these are the same task, in practice it might be slightly different. See also, the Hutter Prize, for compressing wikipedia, a similar challenge.
Only peripherally related: I remember reading an essay ages ago about someone, probably Hofstadter, being called in by a bunch of college students to administer a semi-Turing test (that is, talk to one system and decide whether it’s intelligent) to a particularly impressive system they’d encountered while hacking into computer networks, but I can’t find it now. Does anyone else share this recollection in a more useful form?
[EDITED to remove some details that I put in to help TheOtherDave tell whether we’re talking about the same thing. We were. It’s in Hofstadter’s “Metamagical Themas”. Details removed lest they spoil it for others.]
A few random extracts (i.e., bits that I happen to remember from the transcript) that may help determine whether this is what you had in mind: “I like to understand up without seventeen clams”; “What are arms? That information is classified.” “That’s nice. What else do you like to call ‘GEB’, Doug?”
Yeah, that’s the one I meant, although I’d been refraining from giving away the punchline.
Thank you for the extra Googlable terms, though, they helped me find it!
Somewhat embarrassingly, it’s on a webpage I had been looking at earlier while searching for it, just on the bottom half of it.
Punchline now redacted :-).
First, you mean “pass”, not “past”, I assume. Second, what definition of “conscious” are you using here? If you reject p-zombies, then it has to be behavioral. From this post it seems like this behavior is “able to pass a secret Turing test most actual humans can pass”. which is not a great definition, given how many actual humans fail the existing Turing test miserably.
Are you referring to some specific thing here, or is this a general “people are dumb” sort of thing? If it’s the former, elaborate please, this sounds interesting!
I highly recommend his book Most Human Human, for an interesting perspective on how (most) humans pass the Turing Test.
Interestingly enough, I find numerous references to Clay and this incident, but very little in the way of the transcript it refers to. This was the best I could find (she is “Terminal 4”):
She does sound somewhat unhumanly to me in the latter excerpt you quoted. (But then again, so do certain Wikipedia editors that I’m pretty sure are human.)
Fascinating! Thank you, this is definitely going on my reading list.
It happens once in a while in online chat, where real people behave like chatterbots. Granted, with some probing it is possible to tell the difference, if the suspect stays connected long enough, but there were cases where I was not quite sure. In one case the person (I’m fairly sure it was a person) posted links to their own site and generic sentences like “there are many resources on ”. In another case the person was posting online news snippets and reacted with hostility to any attempts to engage, which is a perfect way to discourage Turing probing if you design a bot.
Imagine a normal test, perhaps a math test in a classroom. Someone knows math but falls asleep and doesn’t answer any of the questions. As a result, they fail the test. Would you say that the test can’t detect whether someone can do math?
Technically, that’s correct. The test doesn’t detect whether someone can do math; it detects whether they are doing math at the time. But it would be stupid to say “hey, you told me this tests if someone can do math! It doesn’t do that at all!” The fact that they used the words “can do” rather than “are doing at the time” is just an example of how human beings don’t use language like a machine, and objecting to that part is pointless.
Likewise, the fact that someone who acts like a chatterbot is detected as a computer by the Turing test does mean that the test doesn’t detect whether someone is a computer. It detects whether they are acting computerish at the time. But “this test detects whether someone is a computer” is how most people would normally describe that, even if that’s not technically accurate. It’s pointless to object that the test doesn’t detect whether someone is a computer on those grounds.
It doesn’t even test whether someone’s doing math at the time. I could be doing all kinds of math and, in consequence, fail the exam.
I would say, rather, that tests generally have implicit preconditions in order for interpretations of their results to be valid.
Standing on a scale is a test for my weight that presumes various things: that I’m not carrying heavy stuff, that I’m not being pulled away from the scale by a significant force, etc. If those presumptions are false and I interpret the scale readings normally, I’ll misjudge my weight. (Similarly, if I instead interpret the scale as a test of my mass, I’m assuming a 1g gravitational field, etc.)
Taking a math test in a classroom makes assumptions about my cognitive state—that I’m awake, trying to pass the exam, can understand the instructions, don’t have a gerbil in my pants, and so forth.
Oops! Now corrected.
I’m not using one. Part of the problem is that the Turing test is measuring something, but it’s not entirely clear what.
Surely it is clear what the Turing test is measuring. It is measuring the ability to pass for a human under certain conditions.
A better question is whether (and in what way) does the ability to pass for a human correlate with other qualities of interest, notably ones which we vaguely describe as “intelligent” or “conscious”.
I always thought (and was very convinced in my belief, though I can’t seem to think of a reason why now) that the Turing test was explicitly designed as a “sufficient” rather than a “necessary” kind of test. As in, you don’t need to pass it to be “human-level”, but if you do then you certainly are. (Or, more precisely, as long as we’ve established we can’t tell, then who cares? With a similar sentiment for exactly what it was we’re comparing for “human-level”: it’s something about how smarter we are than monkeys, we’re not sure quite what it is, but we can’t tell the difference, so you’re in.) A brute-force, first-try, upper-bound sort of test.
But I get the feeling from some of the comments that it claims more than that (or maybe doesn’t disclaim as much). Am I missing some literature or something?
I personally agree with your comment (assuming I understand it correctly). As far as I can tell, however, some people believe that merely being able to converse with humans on their own level is not sufficient to establish the agent’s ability to think on the human level. I personally think this belief is misguided, since it privileges implementation details over function, but I could always be wrong.
IIRC, Turing introduces the concept in the paper as a sufficient but not necessary condition, as you describe here.
I feel it may be neither necessary nor sufficient. It would be a pretty strong indication, but wouldn’t be enough on its own.
Yes, that’s the issue.
Is there any way we can test for consciousness without using some version of the Turing Test ? If the answer is “no”, then I don’t see the point of caring about it.
As for “intelligence”, it’s a little trickier. There could be agents out there who are generally intelligent yet utterly inhuman. The Turing Test would not, admittedly, apply to them.
We could use those extended versions of the Turing tests I mentioned—anything that the computer hasn’t been specifically optimised on would work.
I am not sure what you mean by “optimized on”. What if we made an AI that was really good at both chatting and playing music ? It could pass your extended test then (while many humans, such as f.ex. myself would fail). Now what ?
Then I’d test it on 3d movements. The point is that these tests have great validity as test for general intelligence (or something in the vicinity), if the programmer isn’t deliberately optimising or calibrating their machine on.
If you’d designed a chatterbot and it turned out to be great at playing music (and that wasn’t something you’d put in by hand), then that would be strong evidence for general intelligence.
The deliberate optimization on the part of a designer is just an example of the sort of thing you are concerned about here, right? That is, if I used genetic algorithms to develop a system X, and exposed those algorithms to a set of environments E, X would be optimized for E and consequently any test centered on E (or any subset of it) would be equally unreliable as a test of general intelligence… the important thing is that because X was selected (intentionally or otherwise) to be successful at E, the fact that X is successful at E ought not be treated as evidence that X is generally intelligent.
Yes?
Similarly, the fact that X is successful at tasks not actually present in E, but nevertheless very similar to tasks present in E, ought not be treated as evidence that X is generally intelligent. A small amount of generalization from initial inputs is not that impressive.
The question then becomes how much generalization away from the specific problems presented in E is necessary before we consider X generally intelligent.
To approach the question differently—there are all kinds of cognitive tests which humans fail, because our cognitive systems just weren’t designed to handle the situations those tests measure, because our ancestral environment didn’t contain sufficiently analogous situations. At what point do we therefore conclude that humans aren’t really generally intelligent, just optimized for particular kinds of tests?
I don’t know how useful the Turing Test is. It is (as I understand it) supposed to tell when a computer has become conscious, by comparing its responses to human responses. Yet, only in the case of an Uploaded Mind would we expect the computer to be like a human. In practically every other situation we would’ve given the computer a varietal of different properties. The possible mind space of conscious beings is vastly larger than the mind space of conscious humans.
True, but we are the ones creating the AI. I suspect a programmer that only has access to human thinking would leave their mark upon any such machine.
And, since we WANT something that can relate to us, we must test its capacity for human-like behavior.
An AI that can only relate to intelligent fungi from some far-off star would be absolutely useless to us, and would likely find us equally useless. No common ground would mean no need for contact or commerce. At the risk of sounding a lil’ Ferengi, I want a machine intelligence I can do business with.
I’ve always assumed I would ask the testee to solve a few toy problems expressed in natural language, to see if they can think about things other than conversation itself. After all, in real life I ask people things. Is this considered cheating?
Definitely. Hence do it!
This might be a more promising approach—testing the computer on novel (math or other) problems expressed in natural language. Of course, they’d have to be truly novel, each and every time...
I don’t think there are many novel math problems that are easy enough that you don’t falsely label people as chatbots when they say they don’t know how to do it. If you let people make maybe promising sounding solutions, that opens up a cheating mechanism for the AI to fool you back, by making mushy statements related to the topic.
Math problems are one thing, “problems” expressed in natural language are both easier to invent and (if you’re human) solve, I should think.
Also, math problems don’t seem GAI complete, yeah, especially if you’re dealing with an “average” human player.
Why not test its ability to negotiate and trade, as well as to improvise in human behavior? If it can write you a poem for some money, then invest that money in the stock market, and later use the resulting fortune to benefit itself (an upgrade, perhaps?), then you’re probably dealing with an intelligent being, no? Bonus points if it studies other methods of succeeding, and is willing to benefit other intelligent beings.
If the Turing test is somehow restricted, and you’re just supposed to have a normal conversation, it can be faked. If you’re allowed to ask anything at all, such as offer the AI’s source code and ask how it could be improved, then you have a strong AI. One that dominates humans in every field. I don’t know if it’s necessarily conscious, but it’s definitely intelligent.
This isn’t necessary. You’d be better off using the opposite approach. Keep the capabilities of machines public, and tailor your questions to them. If you know computers are good at scrabble, but bad at diplomacy, you might play a game of diplomacy, but not scrabble.
The problem with this line of reasoning is that the Turing test is very open-ended. You have no idea what a bunch of humans will want to talk to your machine about. Maybe about God, maybe about love, maybe about remembering your first big bloody scrape as a kid… Maybe your machine will get some moral puzzles, maybe logical paradoxes, maybe some nonsense.
And once a machine is truly able to sustain a long conversation on any topic, well, at this point we get back to the interesting question of what does “intelligent” mean.
This was more of a challenge before the web, with its trillions of lines of text on all subjects. Because of this, I don’t consider the text based test as that good anymore—a true open ended test would need to deviate from this text-based format nowadays.
But you can keep on adding specifics to a subject until you arrive at something novel. I don’t think it would even be that hard: just Google the key phrases of whatever you’re about to say, and if you get back results that could be smooshed into a coherent answer, then you need to keep changing up or complicating.
Where does this leave mute humans, or partially paralyzed humans, or any other kind of human who can’t verbally speak your language ? If we still classify them as “human”, then what reason do you have for rejecting the AI ?
here
That’s why the test only offers a sufficient condition for intelligence (not a necessary one) - at least that’s the standard view.
The Turing test retains validity as a general test, on all systems that are not specifically optimised to pass the test.
For instance, the Turing test is good for checking whether whole brain emulations are conscious. Conversation is enough to check that humans are conscious (and if a dog or dolphin managed conversation, it would work as a test for them as well).
This is a circular argument, IMO. How can you tell whether you’re talking to a whole brain emulation, or a bot designed to mimic a whole brain emulation ?
By knowing its provenance. Maybe, when we get more sophisticated and knowledgeable about these things, by looking at its code.
In humans, when assessing whether they’re lying or not, then knowing the details of their pasts (especially, for instance, knowing if they were trained to lie professionally or not) should affect your assessment of their performance.
sorry
P.S.: Whether all this has to do with conscious experience (“consciousness”) we don’t know, I think.