I’m not buying the premise. Passing the Turing test requires to fool an alert, smart person who is deliberately probing the limits of the system. ChatGPT isn’t at that level.
A specially tuned persona that is optimized for this task might do better than the “assistant” persona we have available now, but the model is currently incapable of holding a conversation without going on long, unwanted tangents, getting trapped in loops, etc.
That could well be. Do you think there is a place for a partial Turing Test as in the Loebner Prize—to determine how close to human intelligence it is, even if it has not reached that level?
I definitely think so. The Turing test is a very hard target to hit, and we don’t really have a good idea how to measure IQ, knowledge, human-likeness, etc. I notice a lot of confusion, anthropomorphizing, bad analogies, etc in public discourse right now. To me it feels like the conversation is at a level where we need more precise measures that are human and machine compatible. Benchmarks based on specific tasks (as found in AI papers) don’t cut it.
(ep status: speculative)
Potentially, AI safety folks are better positioned to work on these foundational issues than traditional academics, who are very focused on capabilities and applications right now.
I’m not buying the premise. Passing the Turing test requires to fool an alert, smart person who is deliberately probing the limits of the system. ChatGPT isn’t at that level.
A specially tuned persona that is optimized for this task might do better than the “assistant” persona we have available now, but the model is currently incapable of holding a conversation without going on long, unwanted tangents, getting trapped in loops, etc.
That could well be. Do you think there is a place for a partial Turing Test as in the Loebner Prize—to determine how close to human intelligence it is, even if it has not reached that level?
I definitely think so. The Turing test is a very hard target to hit, and we don’t really have a good idea how to measure IQ, knowledge, human-likeness, etc. I notice a lot of confusion, anthropomorphizing, bad analogies, etc in public discourse right now. To me it feels like the conversation is at a level where we need more precise measures that are human and machine compatible. Benchmarks based on specific tasks (as found in AI papers) don’t cut it.
(ep status: speculative) Potentially, AI safety folks are better positioned to work on these foundational issues than traditional academics, who are very focused on capabilities and applications right now.