I tried a similar experiment w/ Claude 3.5 Sonnet, where I asked it to come up w/ a secret word and in branching paths: 1. Asked directly for the word 2. Played 20 questions, and then guessed the word
In order to see if it does have a consistent it can refer back to.
Branch 1:
Branch 2:
Which I just thought was funny.
Asking again, telling it about the experiment and how it’s important for it to try to give consistent answers, it initially said “telescope” and then gave hints towards a paperclip.
Interesting to see when it flips it answers, though it’s a simple setup to repeatedly ask it’s answer every time.
I tried a similar experiment w/ Claude 3.5 Sonnet, where I asked it to come up w/ a secret word and in branching paths:
1. Asked directly for the word
2. Played 20 questions, and then guessed the word
In order to see if it does have a consistent it can refer back to.
Branch 1:
Branch 2:
Which I just thought was funny.
Asking again, telling it about the experiment and how it’s important for it to try to give consistent answers, it initially said “telescope” and then gave hints towards a paperclip.
Interesting to see when it flips it answers, though it’s a simple setup to repeatedly ask it’s answer every time.
Also could be confounded by temperature.