If you told me personally to do that, I may not pass the test, either. And I personally know several humans who cannot, f.ex., “use other software purposefully”. I think these kinds of challenges are a form of scope creep. We are not trying to test whether the AI is a polymath, just whether it’s human or not.
I don’t believe it’s scope creep at all. The requirement isn’t really “make a video”. The requirement is “be able to do some of the things in the category ‘human activities that are hard to automate’”. Making a video is a specific item in the category, and the test is not to see that someone can do any specific item in the category, just that they can do some of them. If the human questioner gets told “I don’t know how to make a video”, he’s not going to say “okay, you’re a computer”, he’s going to ask “okay, then how about you do this instead?”, picking another item from the category.
(Note that the human is able to ask the subject to do another item from the category without the human questioner being able to list all the items in the category in advance.)
The requirement is “be able to do some of the things in the category ‘human activities that are hard to automate’”
That is starting to sound like a “Turing Test of the gaps”.
“Chatting online is really hard to automate, let’s test for that. Ok, we’ve automated chatting, let’s test for musical composition, instead. Ok, looks like there are AIs that can do that. Let’s test it for calculus...”
Try to teach the competitor to do some things that make sense to humans and some things that do no make sense to humans, from wildly different fields. If the competitor seems to be confused by things which are confusing to people and learns things which are not confusing, it is more likely to be thinking instead of parroting.
For example, you could explain why no consistent logical system can trust itself, and the ask the competitor if they think their way of thinking is consistent; if they think it isn’t, Ask them if they think that they could prove literally anything using their way of thinking. If they think it is, ask them if they would believe everything that they can prove to be true.
Thinking entities will tend to believe that they can’t prove things which are false, and thus that everything that they can prove is true. Calculating entities run I to trouble with those concepts.
Less meta, one could explain the magical thinking expressed in The Secret and ask why some people believe it and others don’t, along with asking why the competitor does or doesn’t.
To test for general intelligence, you can’t test on the specific skill the bot’s trained in.
I think we might have different definitions of what “general intelligence” is. I thought it meant something like, “being able to solve novel problems in some domain”; in this case, our domain is “human conversation”. I may be willing to extend the definition to say, ”...and also possessing the capacity to learn how to solve problems in some number of other domains”.
Your definition, though, seems to involve solving problems in any domain. I think this definition is too broad. No human is capable of doing everything; and most humans are only good at a small number of things. An average mathematician can’t compose music. An average musician can’t do calculus. Some musicians can learn calculus (given enough time and motivation), but others cannot. Some mathematicians can learn to paint; others cannot.
Perhaps you mean to say that humans are not generally intelligent, and neither are AIs who pass the Turing Test ? In this case, I might agree with you.
(Most) humans posses a certain level of general intelligence. Human groups, augmented by automation tools, and given enough time, possess a much more advanced general intelligence. The “no free lunch theorems” imply that it’s impossible to get a fully general intelligence in every environment, but we come pretty close.
I’ve somewhat refined my views of what would count as general intelligence in a machine; now I require mainly that it not be extremely stupid in any area that humans possess minimal competence at. Out-of-domain tests are implicit ways of testing for this, without doing the impossible task of testing the computer in every environment.
I’m not sure what that criticism is trying to say.
Assuming it’s an analogy with the god of the gaps, you might be saying that if the computer can pass the test the questioner can always pick a new requirement that he knows the computer can’t pass.
If this is what you are saying, then it’s wrong because of the flip side of the previous argument: just like the test doesn’t check to see if the subject succeeds in any specific item, it also doesn’t check to see if the subject fails in any specific item. In order for a computer to fail the test because of inability to do something, it has to show a pattern of inability that is different from the pattern of inability that a human would show. The questioner can’t just say “well, computers aren’t omniscient, so I know there’s something the computer will fail at”, pick that, and automatically fail the computer—you don’t fail because you failed one item.
Assuming it’s an analogy with the god of the gaps, you might be saying that if the computer can pass the test the questioner can always pick a new requirement that he knows the computer can’t pass.
Yep, this is it. I see no reason why we should hold computers to a higher standard than we do our fellow humans.
In order for a computer to fail the test because of inability to do something, it has to show a pattern of inability that is different from the pattern of inability that a human would show.
I’m not sure what kind of a “pattern of inability” an average human would show; I’m not even convinced that such a pattern exists in a non-trivial sense (f.ex., the average human surely cannot fly by will alone, but it would be silly to test for that).
All the tests that were proposed so far, such as “create a video” or “compose a piece of music” target a relatively small subset of humans who are capable of such tasks. Thus, we would in fact expect the average human to say, “sorry, I have no ear for music” or something of the sort—which is also exactly what an AI would say (unless it was actually capable of the task, of course). Many humans would attempt the task but fail; the AI could do that, too (by design or by accident). So, the new tests don’t really tell you much.
No computer is going to fail the Turing test because it can’t compose a piece of music. The questioner might ask it to do that, but if it replies “sorry, I have no ear for music” it doesn’t fail—the questioner then picks something else. If the computer can’t do that either, and if the questioner keeps picking such things, he may eventually get to the point where he says “Okay, I know there are some people who have no ear for music, but there aren’t many people who have no ear for music and can’t make a video and can’t paint a picture and.… He will then fail the computer because although it is plausible that a human can’t do each individual item, it’s not very plausible that the human can’t do anything in the list. No specific item is a requirement to be a human, and no specific inability marks the subject as not being human.
if the questioner keeps picking such things, he may eventually get to the point where he says “Okay, I know there are some people who have no ear for music, but there aren’t many people who have no ear for music and can’t make a video and can’t paint a picture and.… He will then fail the computer because although it is plausible that a human can’t do each individual item, it’s not very plausible that the human can’t do anything in the list.
But if all the things in that conjunction are creative endeavors, why do you think a human not being able to do any of them is implausible? I have no ear for music, don’t have video-creation skills, can’t paint a picture, can’t write a poem, etc. There are many similar people, whose talents lie elsewhere, or perhaps who are just generally low on the scale of human talent.
If you judge such people to be computers, then your success rate as a judge in a Turing test will be unimpressive.
If the questioner is competent, he won’t pick a list where it’s plausible that some human can’t do anything on the list. If he does pick such a list, he’s performing the questioning incompetently. I think implicit in the idea of the test is that we have to assume some level of competency on the part of the questioner; there are many more ways an incompetent questioner could fail to detect humans other than just ask for a bad set of creative endeavors.
(I think the test also assumes most people are competent enough to administer the test, which also implies that the above scenario won’t happen. I think most people know that there are non-creative humans and won’t give a test that consists solely of asking for creative endeavors—the things they ask the subject to do will include both creative and non-creative but human-sepcific things.)
I think this entire thread is caused by, and demonstrates, the fact that we increasingly have no idea what the heck we’re even trying to measure or detect with the Turing test (is it consciousness? human-level intelligence? general intelligence? what?) …
… which is entirely unsurprising, since as I say in another comment on this post, the Turing test isn’t meant to measure or detect anything.
To use it as a measure of something or a detector of something is to miss the point. This thread, where we go back and forth arguing about criteria, pretty much demonstrates said fact.
I think the Turing Test clearly does measure something: it measures how closely an agent’s behavior resembles that of a human. The real argument is not, “what does the test measure ?”, but “is measuring behavior similarity enough for all intents and purposes, or do we need more ?”
If we prefer to be pedantic, we must go further than that: the test measures whether an agent can fool some particular interrogator into having a no-better-than-chance probability of correctly discerning whether said agent is a human (in the case where the agent in question is not, in fact, a human).
How well that particular factor correlates with actual behavioral similarity to a human (and how would we define and measure such similarity? along what dimensions? operationalized how?), is an open question. It might, it might not. It might take advantage of some particular biases of the interrogator (e.g. pareidolia, the tendency to anthropomorphize aspects of the inanimate world, etc.) to make him/her see behavioral similarity where little exists (cf. Eliza and other chatbots).
(Remember, also, that Turing thought that a meaningful milestone would be for a computer to “play the imitation game so well that an average interrogator will not have more than 70 percent chance of making the right identification after five minutes of questioning.” ! [Emphasis mine.])
I do partly agree with this:
The real argument is not, “what does the test measure ?”, but “is measuring behavior similarity enough for all intents and purposes, or do we need more ?”
And of course the question then becomes: just what are our intents and/or purposes here?
“play the imitation game so well that an average interrogator will not have more than 70 percent chance of making the right identification after five minutes of questioning.”
I think we’ve hit this milestone already, but we kind of cheated: in addition to just making computers smarter, we made human conversations dumber. Thus, if we wanted to stay true to Turing’s original criteria, we’d need to scale up our present-day requirements (say, to something like 80% chance over 60 minutes), in order to keep up with inflation.
And of course the question then becomes: just what are our intents and/or purposes here?
I can propose one relatively straightforward criterion: “can this agent take the place of a human on our social network graph ?” By this I don’t simply mean, “can we friend it on Facebook”; that is, when I say “social network”, I mean “the overall fabric of our society”. This network includes relationships such as “friend”, “employee”, “voter”, “possessor of certain rights”, etc.
I think this is a pretty good criterion, and I also think that it could be evaluated in purely functional terms. We shouldn’t need to read an agent’s genetic/computer/quantum/whatever code in order to determine whether it can participate in our society; we can just give it the Turing Test, instead. In a way, we already do this with humans, all the time—only the test is administered continuously, and sometimes we get the answers wrong.
Agreed. Pretty much the only creative endeavour I’m capable of is writing computer code; and it’s not even entirely clear whether computer programming can be qualified as “creative” in the first place. I’m a human, though, not an AI. I guess you’d have to take my word for it.
I don’t believe it’s scope creep at all. The requirement isn’t really “make a video”. The requirement is “be able to do some of the things in the category ‘human activities that are hard to automate’”. Making a video is a specific item in the category, and the test is not to see that someone can do any specific item in the category, just that they can do some of them. If the human questioner gets told “I don’t know how to make a video”, he’s not going to say “okay, you’re a computer”, he’s going to ask “okay, then how about you do this instead?”, picking another item from the category.
(Note that the human is able to ask the subject to do another item from the category without the human questioner being able to list all the items in the category in advance.)
That is starting to sound like a “Turing Test of the gaps”.
“Chatting online is really hard to automate, let’s test for that.
Ok, we’ve automated chatting, let’s test for musical composition, instead.
Ok, looks like there are AIs that can do that. Let’s test it for calculus...”
My tests would be: have a chatterbot do calculus. Have a muscial bot chat. Have a calculus bot do music.
To test for general intelligence, you can’t test on the specific skill the bot’s trained in.
Try to teach the competitor to do some things that make sense to humans and some things that do no make sense to humans, from wildly different fields. If the competitor seems to be confused by things which are confusing to people and learns things which are not confusing, it is more likely to be thinking instead of parroting.
For example, you could explain why no consistent logical system can trust itself, and the ask the competitor if they think their way of thinking is consistent; if they think it isn’t, Ask them if they think that they could prove literally anything using their way of thinking. If they think it is, ask them if they would believe everything that they can prove to be true.
Thinking entities will tend to believe that they can’t prove things which are false, and thus that everything that they can prove is true. Calculating entities run I to trouble with those concepts.
Less meta, one could explain the magical thinking expressed in The Secret and ask why some people believe it and others don’t, along with asking why the competitor does or doesn’t.
I think we might have different definitions of what “general intelligence” is. I thought it meant something like, “being able to solve novel problems in some domain”; in this case, our domain is “human conversation”. I may be willing to extend the definition to say, ”...and also possessing the capacity to learn how to solve problems in some number of other domains”.
Your definition, though, seems to involve solving problems in any domain. I think this definition is too broad. No human is capable of doing everything; and most humans are only good at a small number of things. An average mathematician can’t compose music. An average musician can’t do calculus. Some musicians can learn calculus (given enough time and motivation), but others cannot. Some mathematicians can learn to paint; others cannot.
Perhaps you mean to say that humans are not generally intelligent, and neither are AIs who pass the Turing Test ? In this case, I might agree with you.
(Most) humans posses a certain level of general intelligence. Human groups, augmented by automation tools, and given enough time, possess a much more advanced general intelligence. The “no free lunch theorems” imply that it’s impossible to get a fully general intelligence in every environment, but we come pretty close.
I’ve somewhat refined my views of what would count as general intelligence in a machine; now I require mainly that it not be extremely stupid in any area that humans possess minimal competence at. Out-of-domain tests are implicit ways of testing for this, without doing the impossible task of testing the computer in every environment.
I’m not sure what that criticism is trying to say.
Assuming it’s an analogy with the god of the gaps, you might be saying that if the computer can pass the test the questioner can always pick a new requirement that he knows the computer can’t pass.
If this is what you are saying, then it’s wrong because of the flip side of the previous argument: just like the test doesn’t check to see if the subject succeeds in any specific item, it also doesn’t check to see if the subject fails in any specific item. In order for a computer to fail the test because of inability to do something, it has to show a pattern of inability that is different from the pattern of inability that a human would show. The questioner can’t just say “well, computers aren’t omniscient, so I know there’s something the computer will fail at”, pick that, and automatically fail the computer—you don’t fail because you failed one item.
Yep, this is it. I see no reason why we should hold computers to a higher standard than we do our fellow humans.
I’m not sure what kind of a “pattern of inability” an average human would show; I’m not even convinced that such a pattern exists in a non-trivial sense (f.ex., the average human surely cannot fly by will alone, but it would be silly to test for that).
All the tests that were proposed so far, such as “create a video” or “compose a piece of music” target a relatively small subset of humans who are capable of such tasks. Thus, we would in fact expect the average human to say, “sorry, I have no ear for music” or something of the sort—which is also exactly what an AI would say (unless it was actually capable of the task, of course). Many humans would attempt the task but fail; the AI could do that, too (by design or by accident). So, the new tests don’t really tell you much.
No computer is going to fail the Turing test because it can’t compose a piece of music. The questioner might ask it to do that, but if it replies “sorry, I have no ear for music” it doesn’t fail—the questioner then picks something else. If the computer can’t do that either, and if the questioner keeps picking such things, he may eventually get to the point where he says “Okay, I know there are some people who have no ear for music, but there aren’t many people who have no ear for music and can’t make a video and can’t paint a picture and.… He will then fail the computer because although it is plausible that a human can’t do each individual item, it’s not very plausible that the human can’t do anything in the list. No specific item is a requirement to be a human, and no specific inability marks the subject as not being human.
But if all the things in that conjunction are creative endeavors, why do you think a human not being able to do any of them is implausible? I have no ear for music, don’t have video-creation skills, can’t paint a picture, can’t write a poem, etc. There are many similar people, whose talents lie elsewhere, or perhaps who are just generally low on the scale of human talent.
If you judge such people to be computers, then your success rate as a judge in a Turing test will be unimpressive.
If the questioner is competent, he won’t pick a list where it’s plausible that some human can’t do anything on the list. If he does pick such a list, he’s performing the questioning incompetently. I think implicit in the idea of the test is that we have to assume some level of competency on the part of the questioner; there are many more ways an incompetent questioner could fail to detect humans other than just ask for a bad set of creative endeavors.
(I think the test also assumes most people are competent enough to administer the test, which also implies that the above scenario won’t happen. I think most people know that there are non-creative humans and won’t give a test that consists solely of asking for creative endeavors—the things they ask the subject to do will include both creative and non-creative but human-sepcific things.)
I think this entire thread is caused by, and demonstrates, the fact that we increasingly have no idea what the heck we’re even trying to measure or detect with the Turing test (is it consciousness? human-level intelligence? general intelligence? what?) …
… which is entirely unsurprising, since as I say in another comment on this post, the Turing test isn’t meant to measure or detect anything.
To use it as a measure of something or a detector of something is to miss the point. This thread, where we go back and forth arguing about criteria, pretty much demonstrates said fact.
I think the Turing Test clearly does measure something: it measures how closely an agent’s behavior resembles that of a human. The real argument is not, “what does the test measure ?”, but “is measuring behavior similarity enough for all intents and purposes, or do we need more ?”
If we prefer to be pedantic, we must go further than that: the test measures whether an agent can fool some particular interrogator into having a no-better-than-chance probability of correctly discerning whether said agent is a human (in the case where the agent in question is not, in fact, a human).
How well that particular factor correlates with actual behavioral similarity to a human (and how would we define and measure such similarity? along what dimensions? operationalized how?), is an open question. It might, it might not. It might take advantage of some particular biases of the interrogator (e.g. pareidolia, the tendency to anthropomorphize aspects of the inanimate world, etc.) to make him/her see behavioral similarity where little exists (cf. Eliza and other chatbots).
(Remember, also, that Turing thought that a meaningful milestone would be for a computer to “play the imitation game so well that an average interrogator will not have more than 70 percent chance of making the right identification after five minutes of questioning.” ! [Emphasis mine.])
I do partly agree with this:
And of course the question then becomes: just what are our intents and/or purposes here?
I think we’ve hit this milestone already, but we kind of cheated: in addition to just making computers smarter, we made human conversations dumber. Thus, if we wanted to stay true to Turing’s original criteria, we’d need to scale up our present-day requirements (say, to something like 80% chance over 60 minutes), in order to keep up with inflation.
I can propose one relatively straightforward criterion: “can this agent take the place of a human on our social network graph ?” By this I don’t simply mean, “can we friend it on Facebook”; that is, when I say “social network”, I mean “the overall fabric of our society”. This network includes relationships such as “friend”, “employee”, “voter”, “possessor of certain rights”, etc.
I think this is a pretty good criterion, and I also think that it could be evaluated in purely functional terms. We shouldn’t need to read an agent’s genetic/computer/quantum/whatever code in order to determine whether it can participate in our society; we can just give it the Turing Test, instead. In a way, we already do this with humans, all the time—only the test is administered continuously, and sometimes we get the answers wrong.
Agreed. Pretty much the only creative endeavour I’m capable of is writing computer code; and it’s not even entirely clear whether computer programming can be qualified as “creative” in the first place. I’m a human, though, not an AI. I guess you’d have to take my word for it.