I have tried to play with Claude – I would ask it to think of a number, drop the hint, and only then print the number. It should have test the ability to have “hidden memory” that’s outside the text.
I expected it to be able to do that, but the hints to be too obvious. Instead, actually it failed multiple times in a row!
Sharing cause I liked the experiment but wasn’t sure if I executed it properly. There might be a way to do more of this.
P.S. I have also tried “print hash, and then preimage” – but this turned out to be even harder for him
I have tried to play with Claude – I would ask it to think of a number, drop the hint, and only then print the number. It should have test the ability to have “hidden memory” that’s outside the text.
I expected it to be able to do that, but the hints to be too obvious. Instead, actually it failed multiple times in a row!
Sharing cause I liked the experiment but wasn’t sure if I executed it properly. There might be a way to do more of this.
P.S. I have also tried “print hash, and then preimage” – but this turned out to be even harder for him
Post the chat logs?