It’s a little funny that in our quest for a believably human conversation bot, we’ve ended up with conversations that are very much unhuman.
In no conversation would I meet someone and say, “oh hey, how many legs on a millipede?” They’d say to me “haha that’s funny, so are you from around here?” and I’d reply with “how many legs on an ant in Chernobyl?” And if they said to me, “sit here with your arms folded for 4 minutes then repeat this sentence back to me,” I wouldn’t do it. I’d say “why?” and fail right there.
Hmm … that and a la shminux’s xkcd link gives me an idea for a test protocol: instead of having the judges interrogate subjects, the judges give each pair of subjects a discussion topic a la Omegle’s “spy” mode:
Spy mode gives you and a stranger a random question to discuss. The question is submitted by a third stranger who can watch the conversation, but can’t join in.
...and the subjects have a set period of time they are permitted to talk about it. At the end of that time, the judge rates the interesting-ness of each subject’s contribution, and each subject rates their partner. The ratings of confirmed-human subjects would be a basis for evaluating the judges, I presume (although you would probably want a trusted panel of experts to confirm this by inspection of live results), and any subjects who get high ratings out of the unconfirmed pool would be selected for further consideration.
For the same reason that the test shouldn’t try to simulate a human with poor English skills, it also shouldn’t try to simulate a human who isn’t willing to cooperate with a questioner. A random human off the street wouldn’t answer the millipede question, but a random human recruited to take part in an experiment and told to answer reasonable questions probably would.
It’s a little funny that in our quest for a believably human conversation bot, we’ve ended up with conversations that are very much unhuman.
In no conversation would I meet someone and say, “oh hey, how many legs on a millipede?” They’d say to me “haha that’s funny, so are you from around here?” and I’d reply with “how many legs on an ant in Chernobyl?” And if they said to me, “sit here with your arms folded for 4 minutes then repeat this sentence back to me,” I wouldn’t do it. I’d say “why?” and fail right there.
Hmm … that and a la shminux’s xkcd link gives me an idea for a test protocol: instead of having the judges interrogate subjects, the judges give each pair of subjects a discussion topic a la Omegle’s “spy” mode:
...and the subjects have a set period of time they are permitted to talk about it. At the end of that time, the judge rates the interesting-ness of each subject’s contribution, and each subject rates their partner. The ratings of confirmed-human subjects would be a basis for evaluating the judges, I presume (although you would probably want a trusted panel of experts to confirm this by inspection of live results), and any subjects who get high ratings out of the unconfirmed pool would be selected for further consideration.
For the same reason that the test shouldn’t try to simulate a human with poor English skills, it also shouldn’t try to simulate a human who isn’t willing to cooperate with a questioner. A random human off the street wouldn’t answer the millipede question, but a random human recruited to take part in an experiment and told to answer reasonable questions probably would.