Ad 4.
Elite judges is quite arbitrary. I’d rather iterate the test, each time choosing only those judges, who recognized program correctly or some variant of that (e.g. top 50% with most correct guesses). This way we select those, who go beyond simply conforming to a conversation and actually look for differences between program and human. (And as seen from transcripts, most people just try to have a conversation, rather than looking for flaws)
Drawback is that, if program has set personality, judges could just stick to identifing that personality rather than human characteristics.
Another approach might be that, the same pair program-human is examined by 10 judges consecutively, each spending 5 minutes with both. The twist is that judges can leave instructions for next judges. So if program fails to perform “If you want to prove you’re human, simply do nothing for 4 minutes, then re-type this sentence I’ve just written here, skipping one word out of 2”, than every judge after the one, who found that flaw, can use that and make right guess.
My favourite method would be to give bot a simple physics textbook and then ask him to solve few physics test problems. Even if it wouldn’t be actual AI, it would still prove helluva powerful. Just toss it summarized knowledge on quantum physics and ask to solve for GUT.
Sadly, most humans wouldn’t pass such high-school physics test.
is actually original Turing Test.
EDIT:
is bad. It would exclude equally many actual AI and blind people as well.
It is actually more general problem with Turing Test. It helps testing programs that mimic humans, but not AI in general.
For text based AI, senses are alien. You could develop real intelligence, which would fail, when asked “How you like the smell of glass?”. Sure it can be taught that glass don’t smell, but it actually needs superhuman abilities.
So while superintelligence can perfectly mimic human, human-level AI wouldn’t pass Turing Test, when asked about sensual stuff, just as humans would fail, when asked about nuances of geometry in four dimensions.
EDIT: 6. is bad. It would exclude equally many actual AI and blind people as well.
So we wouldn’t use blind people in the human control group. We’d want to get rid of any disabilities that the AI could use as an excuse (like the whole 13 year-old foreign boy).
As for excluding AIs… the Turing test was conceived as a sufficient, but not necessary, measure of intelligence. If AI passes, then intelligent, not the converse (which is much harder).
Stuart, it’s not about control groups, but that such test actually would test negatively for blind, who are intelligent. Blind AI would also test negatively, so how is that useful?
Actually physics test is not about getting closer to humans, but about creating something useful. If we can teach program to do physics, we can teach it to do other stuff. And we’re getting somewhere mid narrow and real AI.
Speaking of original Turing Test, the Wikipedia page has an interesting discussion of the tests proposed in Turing’s original paper. One of the possible reads of that paper includes another possible variation on the test: play Turing’s male-female imitation game, but with the female player replaced by a computer. (If this were the proposed test, I believe many human players would want a bit of advance notice to research makeup techniques, of course.) (Also, I’d want to have ‘all’ four conditions represented: male & female human players, male human & computer, computer & female human, and computer & computer.)
Ad 4. Elite judges is quite arbitrary. I’d rather iterate the test, each time choosing only those judges, who recognized program correctly or some variant of that (e.g. top 50% with most correct guesses). This way we select those, who go beyond simply conforming to a conversation and actually look for differences between program and human. (And as seen from transcripts, most people just try to have a conversation, rather than looking for flaws) Drawback is that, if program has set personality, judges could just stick to identifing that personality rather than human characteristics.
Another approach might be that, the same pair program-human is examined by 10 judges consecutively, each spending 5 minutes with both. The twist is that judges can leave instructions for next judges. So if program fails to perform “If you want to prove you’re human, simply do nothing for 4 minutes, then re-type this sentence I’ve just written here, skipping one word out of 2”, than every judge after the one, who found that flaw, can use that and make right guess.
My favourite method would be to give bot a simple physics textbook and then ask him to solve few physics test problems. Even if it wouldn’t be actual AI, it would still prove helluva powerful. Just toss it summarized knowledge on quantum physics and ask to solve for GUT. Sadly, most humans wouldn’t pass such high-school physics test.
is actually original Turing Test.
EDIT:
is bad. It would exclude equally many actual AI and blind people as well. It is actually more general problem with Turing Test. It helps testing programs that mimic humans, but not AI in general. For text based AI, senses are alien. You could develop real intelligence, which would fail, when asked “How you like the smell of glass?”. Sure it can be taught that glass don’t smell, but it actually needs superhuman abilities. So while superintelligence can perfectly mimic human, human-level AI wouldn’t pass Turing Test, when asked about sensual stuff, just as humans would fail, when asked about nuances of geometry in four dimensions.
So we wouldn’t use blind people in the human control group. We’d want to get rid of any disabilities that the AI could use as an excuse (like the whole 13 year-old foreign boy).
As for excluding AIs… the Turing test was conceived as a sufficient, but not necessary, measure of intelligence. If AI passes, then intelligent, not the converse (which is much harder).
Physics problems are an interesting test—you could check for typical human mistakes.
You could throw in an unsolvable problem and see whether you get plausibly human reactions.
Stuart, it’s not about control groups, but that such test actually would test negatively for blind, who are intelligent. Blind AI would also test negatively, so how is that useful?
Actually physics test is not about getting closer to humans, but about creating something useful. If we can teach program to do physics, we can teach it to do other stuff. And we’re getting somewhere mid narrow and real AI.
Speaking of original Turing Test, the Wikipedia page has an interesting discussion of the tests proposed in Turing’s original paper. One of the possible reads of that paper includes another possible variation on the test: play Turing’s male-female imitation game, but with the female player replaced by a computer. (If this were the proposed test, I believe many human players would want a bit of advance notice to research makeup techniques, of course.) (Also, I’d want to have ‘all’ four conditions represented: male & female human players, male human & computer, computer & female human, and computer & computer.)