First, do we now have an example of an AI not using cognitive capacities that it had, because the ‘face’ it’s presenting wouldn’t have those cognitive capacities?
This does seem like an interesting question. But I think we should be careful to measure against the task we actually asked the system to perform.
For example, if I ask my system to produce a cartoon drawing, it doesn’t seem very notable if I get a cartoon as a result rather than a photorealistic image, even if it could have produced the latter.
Maybe what this just means is that we should track what the user understands the task to be. If the user thinks of it as “play a (not very smart) character who’s asked to do this task”, they’ll have a pretty different understanding of what’s going on than if they think of it as “do this task.”
I think what’s notable in the example in the post is not that the AI is being especially deceptive, but that the user is especially likely to misunderstand the task (compared to tasks that don’t involve dialogues with characters).
For example, if I ask my system to produce a cartoon drawing, it doesn’t seem very notable if I get a cartoon as a result rather than a photorealistic image, even if it could have produced the latter.
Consider instead the scenario where I show a model a photo of a face, and the model produces a photo of the side of that face. An interesting question is “is there a 3d representation of the face in the model?”. It could be getting the right answer that way, or it could be getting it some other way.
Similarly, when it models a ‘dumb’ character, is it calculating the right answer, and then computing an error? Or is it just doing something dumb, which incidentally turns out to be wrong?
This does seem like an interesting question. But I think we should be careful to measure against the task we actually asked the system to perform.
For example, if I ask my system to produce a cartoon drawing, it doesn’t seem very notable if I get a cartoon as a result rather than a photorealistic image, even if it could have produced the latter.
Maybe what this just means is that we should track what the user understands the task to be. If the user thinks of it as “play a (not very smart) character who’s asked to do this task”, they’ll have a pretty different understanding of what’s going on than if they think of it as “do this task.”
I think what’s notable in the example in the post is not that the AI is being especially deceptive, but that the user is especially likely to misunderstand the task (compared to tasks that don’t involve dialogues with characters).
Consider instead the scenario where I show a model a photo of a face, and the model produces a photo of the side of that face. An interesting question is “is there a 3d representation of the face in the model?”. It could be getting the right answer that way, or it could be getting it some other way.
Similarly, when it models a ‘dumb’ character, is it calculating the right answer, and then computing an error? Or is it just doing something dumb, which incidentally turns out to be wrong?
Like, when you look at this example:
How did it come up with 19 and 20? What would it take to make tools that could answer that question?
This framing makes sense to me. Thanks!