I mean move out of whatever format was agreed on. Move away from text-based systems (a truly smart AI could download voice software if it had to—make sure it has time to do so). Unilaterally extend the deadline. Offer side deals or bets with real money (which a smart AI could acquire or pretend to have). Insist the subject create videos on specific themes.
There is a risk of asking more from the AI than a human could deliver.
Imagine a mute human, randomly taken from the street, who has to download voice software and use it to communicate with the judge, without making the judge suspect that they are the AI. How much change of success here? Similarly, how many people would lose bets? Etc.
On the other hand, if we prefer to err on the side of underestimating the AI, but want to avoid overestimating it, then the more difficult task, the better, even if some humans couldn’t solve it. But then… why not give the AI simply the task to convince humans about being intelligent, without any further rules?
1) We build a whole brain emulation, uploading from a particular brain. We subject that WBE to a Turing test, that it passes. Is it conscious? I’d argue yes, even without a definition on consciousness, we must still grant it to the WBE, if we grant it to humans.
2) Same thing, but instead of the WBE, we have a de novo computer system designed specifically to pass the Turing test via mass data crunching. I’d say we now need more proof that this system is conscious.
Why the difference? In the first case the WBE was optimised for being an accurate representation of a brain. So if it passes the Turing test, then it probably is an accurate representation, as it is hard to conceive of a very flawed brain representation that also passes that test.
In the second case, the system was optimised for passing the test only. So it is very possible to conceive it passing the test, but not having the other attributes of consciousness or intelligence. So our tests have to be more rigorous in the second case.
Not that I’ve got particularly good ideas how to do this! I just note that it needs to be done. Maybe “long” Turing tests (6 months or more) might be enough. Or maybe we’ll need to disconnect the AI from the internet (maybe give it a small video feed of some popular TV shows—but only give it info at human-bandwidth), wait for human society to evolve a bit, and test the AI on concepts that weren’t available when it was disconnected.
The form of the AI is also relevant—if it’s optimised for something else, then passing the Turing test is a much stronger indication.
So it is very possible to conceive it passing the test, but not having the other attributes of consciousness or intelligence.
What are these other attributes, as distinct from the attributes it would need to pass the Turing Test ?
Sure, you could ask it to make videos of itself skating or whatever, but a WBE wouldn’t be able to do that, either (seeing as it doesn’t have a body to skate with). Does it mean they both fail ?
you could ask it to make videos of itself skating or whatever
I don’t think he meant it that way. I read it as “make a video montage of a meme” or the like. The point being that such a task exercises more elements of “human intelligence” than just chatting, like lexical and visual metaphors, perception of vision and movement, (at least a little bit of) imagination, planing and execution of a technical task, (presumably) using other software purposefully, etc. It is much harder to plan for and “fake” (whatever that means) all of that than to “fake” a text-only test with chat-bot techniques.
Of course, a blind (for instance) real man might not be able to do that particular task, but he will be able to justify that by being convincingly blind in the rest, and would be able to perform something analogous in other domains. (Music or even reciting something emphatically, or perhaps some tactile task that someone familiar with being blind might imagine.) The point I think is not to tie it to a particular sense or the body, but just to get a higher bandwidth channel for testing, one that would be so hard to fake in close to real time that you’d pretty much have to be smarter to do it.
Testing for consciousness seems to be so hard that text chat is not enough (or at least we’re close to being better at faking it than testing for it), so I guess Stuart suggests we take advantage of the “in-built optimizations” that let us do stuff like fake and detect accents or infer distances from differences in apparent height (but is some contexts, status or other things). Things that we don’t yet fake well, and even when we do, it’s hard to mix and integrate them all.
I read it as “make a video montage of a meme” or the like.
If you told me personally to do that, I may not pass the test, either. And I personally know several humans who cannot, f.ex., “use other software purposefully”. I think these kinds of challenges are a form of scope creep. We are not trying to test whether the AI is a polymath, just whether it’s human or not.
It is much harder to plan for and “fake” (whatever that means) all of that than to “fake” a text-only test with chat-bot techniques.
I disagree; that is, while I agree that participating in many types of interactions is more difficult than participating in a single type of interaction, I disagree that this degree of difficulty is important.
As I said before, in order to hold an engaging conversation with a human through “fakery”, the AI would have to “fake” human-level intelligence. Sure, it could try to steer the conversation toward its own area of expertise—but firstly, this is what real humans do as well, and secondly, it would still have to do so convincingly, knowing full well that its interlocutor may refuse to be steered. I simply don’t know of a way to perfectly “fake” this level of intelligence without actually being intelligent.
You speak of “higher bandwidth channels for testing”, but consider the fact that there are several humans in existence today, at this very moment, whose interaction with you consists entirely of text. Do you accept that they are, in fact, human ? If so, then what’s the difference between them and (hypothetical) Turing-grade AIs ?
If you told me personally to do that, I may not pass the test, either. And I personally know several humans who cannot, f.ex., “use other software purposefully”. I think these kinds of challenges are a form of scope creep. We are not trying to test whether the AI is a polymath, just whether it’s human or not.
I don’t believe it’s scope creep at all. The requirement isn’t really “make a video”. The requirement is “be able to do some of the things in the category ‘human activities that are hard to automate’”. Making a video is a specific item in the category, and the test is not to see that someone can do any specific item in the category, just that they can do some of them. If the human questioner gets told “I don’t know how to make a video”, he’s not going to say “okay, you’re a computer”, he’s going to ask “okay, then how about you do this instead?”, picking another item from the category.
(Note that the human is able to ask the subject to do another item from the category without the human questioner being able to list all the items in the category in advance.)
The requirement is “be able to do some of the things in the category ‘human activities that are hard to automate’”
That is starting to sound like a “Turing Test of the gaps”.
“Chatting online is really hard to automate, let’s test for that. Ok, we’ve automated chatting, let’s test for musical composition, instead. Ok, looks like there are AIs that can do that. Let’s test it for calculus...”
Try to teach the competitor to do some things that make sense to humans and some things that do no make sense to humans, from wildly different fields. If the competitor seems to be confused by things which are confusing to people and learns things which are not confusing, it is more likely to be thinking instead of parroting.
For example, you could explain why no consistent logical system can trust itself, and the ask the competitor if they think their way of thinking is consistent; if they think it isn’t, Ask them if they think that they could prove literally anything using their way of thinking. If they think it is, ask them if they would believe everything that they can prove to be true.
Thinking entities will tend to believe that they can’t prove things which are false, and thus that everything that they can prove is true. Calculating entities run I to trouble with those concepts.
Less meta, one could explain the magical thinking expressed in The Secret and ask why some people believe it and others don’t, along with asking why the competitor does or doesn’t.
To test for general intelligence, you can’t test on the specific skill the bot’s trained in.
I think we might have different definitions of what “general intelligence” is. I thought it meant something like, “being able to solve novel problems in some domain”; in this case, our domain is “human conversation”. I may be willing to extend the definition to say, ”...and also possessing the capacity to learn how to solve problems in some number of other domains”.
Your definition, though, seems to involve solving problems in any domain. I think this definition is too broad. No human is capable of doing everything; and most humans are only good at a small number of things. An average mathematician can’t compose music. An average musician can’t do calculus. Some musicians can learn calculus (given enough time and motivation), but others cannot. Some mathematicians can learn to paint; others cannot.
Perhaps you mean to say that humans are not generally intelligent, and neither are AIs who pass the Turing Test ? In this case, I might agree with you.
(Most) humans posses a certain level of general intelligence. Human groups, augmented by automation tools, and given enough time, possess a much more advanced general intelligence. The “no free lunch theorems” imply that it’s impossible to get a fully general intelligence in every environment, but we come pretty close.
I’ve somewhat refined my views of what would count as general intelligence in a machine; now I require mainly that it not be extremely stupid in any area that humans possess minimal competence at. Out-of-domain tests are implicit ways of testing for this, without doing the impossible task of testing the computer in every environment.
I’m not sure what that criticism is trying to say.
Assuming it’s an analogy with the god of the gaps, you might be saying that if the computer can pass the test the questioner can always pick a new requirement that he knows the computer can’t pass.
If this is what you are saying, then it’s wrong because of the flip side of the previous argument: just like the test doesn’t check to see if the subject succeeds in any specific item, it also doesn’t check to see if the subject fails in any specific item. In order for a computer to fail the test because of inability to do something, it has to show a pattern of inability that is different from the pattern of inability that a human would show. The questioner can’t just say “well, computers aren’t omniscient, so I know there’s something the computer will fail at”, pick that, and automatically fail the computer—you don’t fail because you failed one item.
Assuming it’s an analogy with the god of the gaps, you might be saying that if the computer can pass the test the questioner can always pick a new requirement that he knows the computer can’t pass.
Yep, this is it. I see no reason why we should hold computers to a higher standard than we do our fellow humans.
In order for a computer to fail the test because of inability to do something, it has to show a pattern of inability that is different from the pattern of inability that a human would show.
I’m not sure what kind of a “pattern of inability” an average human would show; I’m not even convinced that such a pattern exists in a non-trivial sense (f.ex., the average human surely cannot fly by will alone, but it would be silly to test for that).
All the tests that were proposed so far, such as “create a video” or “compose a piece of music” target a relatively small subset of humans who are capable of such tasks. Thus, we would in fact expect the average human to say, “sorry, I have no ear for music” or something of the sort—which is also exactly what an AI would say (unless it was actually capable of the task, of course). Many humans would attempt the task but fail; the AI could do that, too (by design or by accident). So, the new tests don’t really tell you much.
No computer is going to fail the Turing test because it can’t compose a piece of music. The questioner might ask it to do that, but if it replies “sorry, I have no ear for music” it doesn’t fail—the questioner then picks something else. If the computer can’t do that either, and if the questioner keeps picking such things, he may eventually get to the point where he says “Okay, I know there are some people who have no ear for music, but there aren’t many people who have no ear for music and can’t make a video and can’t paint a picture and.… He will then fail the computer because although it is plausible that a human can’t do each individual item, it’s not very plausible that the human can’t do anything in the list. No specific item is a requirement to be a human, and no specific inability marks the subject as not being human.
if the questioner keeps picking such things, he may eventually get to the point where he says “Okay, I know there are some people who have no ear for music, but there aren’t many people who have no ear for music and can’t make a video and can’t paint a picture and.… He will then fail the computer because although it is plausible that a human can’t do each individual item, it’s not very plausible that the human can’t do anything in the list.
But if all the things in that conjunction are creative endeavors, why do you think a human not being able to do any of them is implausible? I have no ear for music, don’t have video-creation skills, can’t paint a picture, can’t write a poem, etc. There are many similar people, whose talents lie elsewhere, or perhaps who are just generally low on the scale of human talent.
If you judge such people to be computers, then your success rate as a judge in a Turing test will be unimpressive.
If the questioner is competent, he won’t pick a list where it’s plausible that some human can’t do anything on the list. If he does pick such a list, he’s performing the questioning incompetently. I think implicit in the idea of the test is that we have to assume some level of competency on the part of the questioner; there are many more ways an incompetent questioner could fail to detect humans other than just ask for a bad set of creative endeavors.
(I think the test also assumes most people are competent enough to administer the test, which also implies that the above scenario won’t happen. I think most people know that there are non-creative humans and won’t give a test that consists solely of asking for creative endeavors—the things they ask the subject to do will include both creative and non-creative but human-sepcific things.)
I think this entire thread is caused by, and demonstrates, the fact that we increasingly have no idea what the heck we’re even trying to measure or detect with the Turing test (is it consciousness? human-level intelligence? general intelligence? what?) …
… which is entirely unsurprising, since as I say in another comment on this post, the Turing test isn’t meant to measure or detect anything.
To use it as a measure of something or a detector of something is to miss the point. This thread, where we go back and forth arguing about criteria, pretty much demonstrates said fact.
I think the Turing Test clearly does measure something: it measures how closely an agent’s behavior resembles that of a human. The real argument is not, “what does the test measure ?”, but “is measuring behavior similarity enough for all intents and purposes, or do we need more ?”
If we prefer to be pedantic, we must go further than that: the test measures whether an agent can fool some particular interrogator into having a no-better-than-chance probability of correctly discerning whether said agent is a human (in the case where the agent in question is not, in fact, a human).
How well that particular factor correlates with actual behavioral similarity to a human (and how would we define and measure such similarity? along what dimensions? operationalized how?), is an open question. It might, it might not. It might take advantage of some particular biases of the interrogator (e.g. pareidolia, the tendency to anthropomorphize aspects of the inanimate world, etc.) to make him/her see behavioral similarity where little exists (cf. Eliza and other chatbots).
(Remember, also, that Turing thought that a meaningful milestone would be for a computer to “play the imitation game so well that an average interrogator will not have more than 70 percent chance of making the right identification after five minutes of questioning.” ! [Emphasis mine.])
I do partly agree with this:
The real argument is not, “what does the test measure ?”, but “is measuring behavior similarity enough for all intents and purposes, or do we need more ?”
And of course the question then becomes: just what are our intents and/or purposes here?
“play the imitation game so well that an average interrogator will not have more than 70 percent chance of making the right identification after five minutes of questioning.”
I think we’ve hit this milestone already, but we kind of cheated: in addition to just making computers smarter, we made human conversations dumber. Thus, if we wanted to stay true to Turing’s original criteria, we’d need to scale up our present-day requirements (say, to something like 80% chance over 60 minutes), in order to keep up with inflation.
And of course the question then becomes: just what are our intents and/or purposes here?
I can propose one relatively straightforward criterion: “can this agent take the place of a human on our social network graph ?” By this I don’t simply mean, “can we friend it on Facebook”; that is, when I say “social network”, I mean “the overall fabric of our society”. This network includes relationships such as “friend”, “employee”, “voter”, “possessor of certain rights”, etc.
I think this is a pretty good criterion, and I also think that it could be evaluated in purely functional terms. We shouldn’t need to read an agent’s genetic/computer/quantum/whatever code in order to determine whether it can participate in our society; we can just give it the Turing Test, instead. In a way, we already do this with humans, all the time—only the test is administered continuously, and sometimes we get the answers wrong.
Agreed. Pretty much the only creative endeavour I’m capable of is writing computer code; and it’s not even entirely clear whether computer programming can be qualified as “creative” in the first place. I’m a human, though, not an AI. I guess you’d have to take my word for it.
I mean move out of whatever format was agreed on. Move away from text-based systems (a truly smart AI could download voice software if it had to—make sure it has time to do so). Unilaterally extend the deadline. Offer side deals or bets with real money (which a smart AI could acquire or pretend to have). Insist the subject create videos on specific themes.
Do stuff you’re not supposed/expected to do.
There is a risk of asking more from the AI than a human could deliver.
Imagine a mute human, randomly taken from the street, who has to download voice software and use it to communicate with the judge, without making the judge suspect that they are the AI. How much change of success here? Similarly, how many people would lose bets? Etc.
On the other hand, if we prefer to err on the side of underestimating the AI, but want to avoid overestimating it, then the more difficult task, the better, even if some humans couldn’t solve it. But then… why not give the AI simply the task to convince humans about being intelligent, without any further rules?
Let’s contrast two situations:
1) We build a whole brain emulation, uploading from a particular brain. We subject that WBE to a Turing test, that it passes. Is it conscious? I’d argue yes, even without a definition on consciousness, we must still grant it to the WBE, if we grant it to humans.
2) Same thing, but instead of the WBE, we have a de novo computer system designed specifically to pass the Turing test via mass data crunching. I’d say we now need more proof that this system is conscious.
Why the difference? In the first case the WBE was optimised for being an accurate representation of a brain. So if it passes the Turing test, then it probably is an accurate representation, as it is hard to conceive of a very flawed brain representation that also passes that test.
In the second case, the system was optimised for passing the test only. So it is very possible to conceive it passing the test, but not having the other attributes of consciousness or intelligence. So our tests have to be more rigorous in the second case.
Not that I’ve got particularly good ideas how to do this! I just note that it needs to be done. Maybe “long” Turing tests (6 months or more) might be enough. Or maybe we’ll need to disconnect the AI from the internet (maybe give it a small video feed of some popular TV shows—but only give it info at human-bandwidth), wait for human society to evolve a bit, and test the AI on concepts that weren’t available when it was disconnected.
The form of the AI is also relevant—if it’s optimised for something else, then passing the Turing test is a much stronger indication.
What are these other attributes, as distinct from the attributes it would need to pass the Turing Test ?
Sure, you could ask it to make videos of itself skating or whatever, but a WBE wouldn’t be able to do that, either (seeing as it doesn’t have a body to skate with). Does it mean they both fail ?
I don’t think he meant it that way. I read it as “make a video montage of a meme” or the like. The point being that such a task exercises more elements of “human intelligence” than just chatting, like lexical and visual metaphors, perception of vision and movement, (at least a little bit of) imagination, planing and execution of a technical task, (presumably) using other software purposefully, etc. It is much harder to plan for and “fake” (whatever that means) all of that than to “fake” a text-only test with chat-bot techniques.
Of course, a blind (for instance) real man might not be able to do that particular task, but he will be able to justify that by being convincingly blind in the rest, and would be able to perform something analogous in other domains. (Music or even reciting something emphatically, or perhaps some tactile task that someone familiar with being blind might imagine.) The point I think is not to tie it to a particular sense or the body, but just to get a higher bandwidth channel for testing, one that would be so hard to fake in close to real time that you’d pretty much have to be smarter to do it.
Testing for consciousness seems to be so hard that text chat is not enough (or at least we’re close to being better at faking it than testing for it), so I guess Stuart suggests we take advantage of the “in-built optimizations” that let us do stuff like fake and detect accents or infer distances from differences in apparent height (but is some contexts, status or other things). Things that we don’t yet fake well, and even when we do, it’s hard to mix and integrate them all.
If you told me personally to do that, I may not pass the test, either. And I personally know several humans who cannot, f.ex., “use other software purposefully”. I think these kinds of challenges are a form of scope creep. We are not trying to test whether the AI is a polymath, just whether it’s human or not.
I disagree; that is, while I agree that participating in many types of interactions is more difficult than participating in a single type of interaction, I disagree that this degree of difficulty is important.
As I said before, in order to hold an engaging conversation with a human through “fakery”, the AI would have to “fake” human-level intelligence. Sure, it could try to steer the conversation toward its own area of expertise—but firstly, this is what real humans do as well, and secondly, it would still have to do so convincingly, knowing full well that its interlocutor may refuse to be steered. I simply don’t know of a way to perfectly “fake” this level of intelligence without actually being intelligent.
You speak of “higher bandwidth channels for testing”, but consider the fact that there are several humans in existence today, at this very moment, whose interaction with you consists entirely of text. Do you accept that they are, in fact, human ? If so, then what’s the difference between them and (hypothetical) Turing-grade AIs ?
I don’t believe it’s scope creep at all. The requirement isn’t really “make a video”. The requirement is “be able to do some of the things in the category ‘human activities that are hard to automate’”. Making a video is a specific item in the category, and the test is not to see that someone can do any specific item in the category, just that they can do some of them. If the human questioner gets told “I don’t know how to make a video”, he’s not going to say “okay, you’re a computer”, he’s going to ask “okay, then how about you do this instead?”, picking another item from the category.
(Note that the human is able to ask the subject to do another item from the category without the human questioner being able to list all the items in the category in advance.)
That is starting to sound like a “Turing Test of the gaps”.
“Chatting online is really hard to automate, let’s test for that.
Ok, we’ve automated chatting, let’s test for musical composition, instead.
Ok, looks like there are AIs that can do that. Let’s test it for calculus...”
My tests would be: have a chatterbot do calculus. Have a muscial bot chat. Have a calculus bot do music.
To test for general intelligence, you can’t test on the specific skill the bot’s trained in.
Try to teach the competitor to do some things that make sense to humans and some things that do no make sense to humans, from wildly different fields. If the competitor seems to be confused by things which are confusing to people and learns things which are not confusing, it is more likely to be thinking instead of parroting.
For example, you could explain why no consistent logical system can trust itself, and the ask the competitor if they think their way of thinking is consistent; if they think it isn’t, Ask them if they think that they could prove literally anything using their way of thinking. If they think it is, ask them if they would believe everything that they can prove to be true.
Thinking entities will tend to believe that they can’t prove things which are false, and thus that everything that they can prove is true. Calculating entities run I to trouble with those concepts.
Less meta, one could explain the magical thinking expressed in The Secret and ask why some people believe it and others don’t, along with asking why the competitor does or doesn’t.
I think we might have different definitions of what “general intelligence” is. I thought it meant something like, “being able to solve novel problems in some domain”; in this case, our domain is “human conversation”. I may be willing to extend the definition to say, ”...and also possessing the capacity to learn how to solve problems in some number of other domains”.
Your definition, though, seems to involve solving problems in any domain. I think this definition is too broad. No human is capable of doing everything; and most humans are only good at a small number of things. An average mathematician can’t compose music. An average musician can’t do calculus. Some musicians can learn calculus (given enough time and motivation), but others cannot. Some mathematicians can learn to paint; others cannot.
Perhaps you mean to say that humans are not generally intelligent, and neither are AIs who pass the Turing Test ? In this case, I might agree with you.
(Most) humans posses a certain level of general intelligence. Human groups, augmented by automation tools, and given enough time, possess a much more advanced general intelligence. The “no free lunch theorems” imply that it’s impossible to get a fully general intelligence in every environment, but we come pretty close.
I’ve somewhat refined my views of what would count as general intelligence in a machine; now I require mainly that it not be extremely stupid in any area that humans possess minimal competence at. Out-of-domain tests are implicit ways of testing for this, without doing the impossible task of testing the computer in every environment.
I’m not sure what that criticism is trying to say.
Assuming it’s an analogy with the god of the gaps, you might be saying that if the computer can pass the test the questioner can always pick a new requirement that he knows the computer can’t pass.
If this is what you are saying, then it’s wrong because of the flip side of the previous argument: just like the test doesn’t check to see if the subject succeeds in any specific item, it also doesn’t check to see if the subject fails in any specific item. In order for a computer to fail the test because of inability to do something, it has to show a pattern of inability that is different from the pattern of inability that a human would show. The questioner can’t just say “well, computers aren’t omniscient, so I know there’s something the computer will fail at”, pick that, and automatically fail the computer—you don’t fail because you failed one item.
Yep, this is it. I see no reason why we should hold computers to a higher standard than we do our fellow humans.
I’m not sure what kind of a “pattern of inability” an average human would show; I’m not even convinced that such a pattern exists in a non-trivial sense (f.ex., the average human surely cannot fly by will alone, but it would be silly to test for that).
All the tests that were proposed so far, such as “create a video” or “compose a piece of music” target a relatively small subset of humans who are capable of such tasks. Thus, we would in fact expect the average human to say, “sorry, I have no ear for music” or something of the sort—which is also exactly what an AI would say (unless it was actually capable of the task, of course). Many humans would attempt the task but fail; the AI could do that, too (by design or by accident). So, the new tests don’t really tell you much.
No computer is going to fail the Turing test because it can’t compose a piece of music. The questioner might ask it to do that, but if it replies “sorry, I have no ear for music” it doesn’t fail—the questioner then picks something else. If the computer can’t do that either, and if the questioner keeps picking such things, he may eventually get to the point where he says “Okay, I know there are some people who have no ear for music, but there aren’t many people who have no ear for music and can’t make a video and can’t paint a picture and.… He will then fail the computer because although it is plausible that a human can’t do each individual item, it’s not very plausible that the human can’t do anything in the list. No specific item is a requirement to be a human, and no specific inability marks the subject as not being human.
But if all the things in that conjunction are creative endeavors, why do you think a human not being able to do any of them is implausible? I have no ear for music, don’t have video-creation skills, can’t paint a picture, can’t write a poem, etc. There are many similar people, whose talents lie elsewhere, or perhaps who are just generally low on the scale of human talent.
If you judge such people to be computers, then your success rate as a judge in a Turing test will be unimpressive.
If the questioner is competent, he won’t pick a list where it’s plausible that some human can’t do anything on the list. If he does pick such a list, he’s performing the questioning incompetently. I think implicit in the idea of the test is that we have to assume some level of competency on the part of the questioner; there are many more ways an incompetent questioner could fail to detect humans other than just ask for a bad set of creative endeavors.
(I think the test also assumes most people are competent enough to administer the test, which also implies that the above scenario won’t happen. I think most people know that there are non-creative humans and won’t give a test that consists solely of asking for creative endeavors—the things they ask the subject to do will include both creative and non-creative but human-sepcific things.)
I think this entire thread is caused by, and demonstrates, the fact that we increasingly have no idea what the heck we’re even trying to measure or detect with the Turing test (is it consciousness? human-level intelligence? general intelligence? what?) …
… which is entirely unsurprising, since as I say in another comment on this post, the Turing test isn’t meant to measure or detect anything.
To use it as a measure of something or a detector of something is to miss the point. This thread, where we go back and forth arguing about criteria, pretty much demonstrates said fact.
I think the Turing Test clearly does measure something: it measures how closely an agent’s behavior resembles that of a human. The real argument is not, “what does the test measure ?”, but “is measuring behavior similarity enough for all intents and purposes, or do we need more ?”
If we prefer to be pedantic, we must go further than that: the test measures whether an agent can fool some particular interrogator into having a no-better-than-chance probability of correctly discerning whether said agent is a human (in the case where the agent in question is not, in fact, a human).
How well that particular factor correlates with actual behavioral similarity to a human (and how would we define and measure such similarity? along what dimensions? operationalized how?), is an open question. It might, it might not. It might take advantage of some particular biases of the interrogator (e.g. pareidolia, the tendency to anthropomorphize aspects of the inanimate world, etc.) to make him/her see behavioral similarity where little exists (cf. Eliza and other chatbots).
(Remember, also, that Turing thought that a meaningful milestone would be for a computer to “play the imitation game so well that an average interrogator will not have more than 70 percent chance of making the right identification after five minutes of questioning.” ! [Emphasis mine.])
I do partly agree with this:
And of course the question then becomes: just what are our intents and/or purposes here?
I think we’ve hit this milestone already, but we kind of cheated: in addition to just making computers smarter, we made human conversations dumber. Thus, if we wanted to stay true to Turing’s original criteria, we’d need to scale up our present-day requirements (say, to something like 80% chance over 60 minutes), in order to keep up with inflation.
I can propose one relatively straightforward criterion: “can this agent take the place of a human on our social network graph ?” By this I don’t simply mean, “can we friend it on Facebook”; that is, when I say “social network”, I mean “the overall fabric of our society”. This network includes relationships such as “friend”, “employee”, “voter”, “possessor of certain rights”, etc.
I think this is a pretty good criterion, and I also think that it could be evaluated in purely functional terms. We shouldn’t need to read an agent’s genetic/computer/quantum/whatever code in order to determine whether it can participate in our society; we can just give it the Turing Test, instead. In a way, we already do this with humans, all the time—only the test is administered continuously, and sometimes we get the answers wrong.
Agreed. Pretty much the only creative endeavour I’m capable of is writing computer code; and it’s not even entirely clear whether computer programming can be qualified as “creative” in the first place. I’m a human, though, not an AI. I guess you’d have to take my word for it.