Kaj_Sotala comments on ChatGPT can learn indirect control

Kaj_Sotala Mar 24, 2024, 8:04 PM
36 points
3
Someone pointed out that this only seems to work if the screenshots include the “ChatGPT” speaker tag; if you only screenshot the text of ChatGPT’s most recent response without the label indicating it is from ChatGPT, it seems to fail. Oddly, in one of my tests, it seemed to recognize its own text on the first time I sent it a screenshot, but then didn’t manage to figure out what to do next (nor did it mention this insight in the later replies).
So maybe this is more about it recognizing its own name than itself in a mirror?
What links here?
- ChatGPT can learn indirect control by Raymond Douglas (Mar 21, 2024, 9:11 PM; 213 points)
- Chris_Leong's comment on ChatGPT can learn indirect control by Raymond Douglas (Mar 25, 2024, 8:12 AM; 4 points)
- Raymond Douglas Mar 24, 2024, 10:07 PM
  34 points
  8
  Parent
  Oh interesting! I just had a go at testing it on screenshots from a parallel conversation and it seems like it incorrectly interprets those screenshots as also being of its own conversation.
  So it seems like ‘recognising things it has said’ is doing very little of the heavy lifting and ‘recognising its own name’ is responsible for most of the effect.
  I’ll have a bit more of a play around and probably put a disclaimer at the top of the post some time soon.
- superdau Mar 24, 2024, 11:53 PM
  12 points
  1
  Parent
  I just managed to replicate game successfully while sending only the message text as an image (screenshots below). So it works at least sometimes.
  To get this result, I tried 3 times. In one attempt, it just failed. In the other, it recognized the screenshots, and won accidentally by spelling out the weekdays while instructing me to use an image editor. On the third try, it understood the game.
- Chris_Leong Mar 25, 2024, 8:10 AM
  2 points
  0
  Parent
  Yeah, that’s a pretty sharp limitation on the result.
  
  I’d love to know if any other AI is able to pass this test when we exclude the tag.