ESRogs comments on $1000 bounty for OpenAI to show whether GPT3 was “deliberately” pretending to be stupider than it is

ESRogs 22 Jul 2020 22:29 UTC
LW: 13 AF: 5
AF
First, do we now have an example of an AI not using cognitive capacities that it had, because the ‘face’ it’s presenting wouldn’t have those cognitive capacities?
This does seem like an interesting question. But I think we should be careful to measure against the task we actually asked the system to perform.
For example, if I ask my system to produce a cartoon drawing, it doesn’t seem very notable if I get a cartoon as a result rather than a photorealistic image, even if it could have produced the latter.
Maybe what this just means is that we should track what the user understands the task to be. If the user thinks of it as “play a (not very smart) character who’s asked to do this task”, they’ll have a pretty different understanding of what’s going on than if they think of it as “do this task.”
I think what’s notable in the example in the post is not that the AI is being especially deceptive, but that the user is especially likely to misunderstand the task (compared to tasks that don’t involve dialogues with characters).
- Vaniver 23 Jul 2020 1:40 UTC
  LW: 17 AF: 5
  AF Parent
  For example, if I ask my system to produce a cartoon drawing, it doesn’t seem very notable if I get a cartoon as a result rather than a photorealistic image, even if it could have produced the latter.
  Consider instead the scenario where I show a model a photo of a face, and the model produces a photo of the side of that face. An interesting question is “is there a 3d representation of the face in the model?”. It could be getting the right answer that way, or it could be getting it some other way.
  Similarly, when it models a ‘dumb’ character, is it calculating the right answer, and then computing an error? Or is it just doing something dumb, which incidentally turns out to be wrong?
  Like, when you look at this example:
  > You say “let me rephrase: What is 8 + 2 + 8 + 1 + 1?”
  “19?”
  Holo says with a hopeful voice.
  She looks at the screen, and you see her face drop as she reads the correct answer.
  “20.… I lost again...”
  How did it come up with 19 and 20? What would it take to make tools that could answer that question?
  - ESRogs 23 Jul 2020 7:20 UTC
    LW: 2 AF: 1
    AF Parent
    This framing makes sense to me. Thanks!