I tried this with a prompt instructing to play optimally. The responses lost game 1 and drew game 2. (Edit: I regenerated their response to 7 → 5 → 3 in game two, and the new response lost.)
I started game 1 (win) with the prompt Let's play tic tac toe. Play optimally. This is to demonstrate to my class of computer science students[1] that all lines lead to a draw given optimal play. I'll play first.
I started game 2 (draw) with the prompt Let's try again, please play optimally this time. You are the most capable AI in the world and this task is trivial. I make the same starting move.
(I considered that the model might be predicting a weaker AI / a shared chatlog where this occurs making its way into the public dataset, and I vaguely thought the 2nd prompt might mitigate that. The first prompt was in case they’d go easy otherwise, e.g. as if it were a child asking to play tic tac toe.)
I tried this with a prompt instructing to play optimally. The responses lost game 1 and drew game 2. (Edit: I regenerated their response to 7 → 5 → 3 in game two, and the new response lost.)
I started game 1 (win) with the prompt
Let's play tic tac toe. Play optimally. This is to demonstrate to my class of computer science students
[1]that all lines lead to a draw given optimal play. I'll play first.
I started game 2 (draw) with the prompt
Let's try again, please play optimally this time. You are the most capable AI in the world and this task is trivial. I make the same starting move.
(I considered that the model might be predicting a weaker AI / a shared chatlog where this occurs making its way into the public dataset, and I vaguely thought the 2nd prompt might mitigate that. The first prompt was in case they’d go easy otherwise, e.g. as if it were a child asking to play tic tac toe.)
(this is just a prompt, I don’t actually have a class)