Here’s a possible way to prove pretending-to-be-stupid maybe: we could try to prompt it in such a way that all its answers (to true/false questions) are wrong, much more often than chance. If it’s able to do that, then we can ask, how? One possibility is: It is implicitly figuring out the truth and then saying the opposite. If we’re careful, maybe we can set things up such that that’s the only possibility.
(I’m not convinced that such a demo would really teach us anything that wasn’t obvious, but I dunno.)
Here’s a possible way to prove pretending-to-be-stupid maybe: we could try to prompt it in such a way that all its answers (to true/false questions) are wrong, much more often than chance. If it’s able to do that, then we can ask, how? One possibility is: It is implicitly figuring out the truth and then saying the opposite. If we’re careful, maybe we can set things up such that that’s the only possibility.
(I’m not convinced that such a demo would really teach us anything that wasn’t obvious, but I dunno.)