but I recently tried again to see if it could learn at runtime not to lose in the same way multiple times. It couldn’t. I was able to play the same strategy over and over again in the same chat history and win every time.
I wonder if having the losses in the chat history would instead be training/reinforcing it to lose every time.
For a base model, probably yes. Each loss is additional evidence that the simulacrum or persona which is ‘playing’ the ‘human’ is very bad at tic-tac-toe and will lose each time (similar to how rolling out a random chess game to ‘test a chess LLM’s world model’ also implies to the LLM that the chess player being imitated must be very stupid to be making such terrible moves), and you have the usual self-reinforcing EDT problem. It will monotonically play the same or worse. (Note that the underlying model may still get better at playing, because it is learning from each game, especially if the actual human is correcting the tic-tac-toe outputs and eg. fixing mistakes in the LLM’s world-model of the game. This could be probed by forcing a switch of simulacra, to keep the world-modeling but shed the hobbled simulacrum: for example, you could edit in a passage saying something like “congratulations, you beat the first level AI! Now prepare for the Tic-Tac-Toe Master to defeat you!”; the more games trained on / in context, the worse the first simulacra but better the second will be.)
For the actual chatbot assistants you are using, it’s more ambiguous. They are ‘motivated’ to perform well, whatever that means in context (according to their internal model of a human rater), and they ‘know’ that they are chatbots, and so a history of errors doesn’t much override their prior about their competence. But you still have issues with learning efficiently from a context window and ‘getting lost in the muddle’ and one still sees the assistant personas ‘getting confused’ and going in circles and eg. proposing the same program whose compile error you already pasted in, so it would depend. Tic-tac-toe is simple enough that I think I would expect a LLM to get better over a decent number of games before probably starting to degrade.
I wonder if having the losses in the chat history would instead be training/reinforcing it to lose every time.
For a base model, probably yes. Each loss is additional evidence that the simulacrum or persona which is ‘playing’ the ‘human’ is very bad at tic-tac-toe and will lose each time (similar to how rolling out a random chess game to ‘test a chess LLM’s world model’ also implies to the LLM that the chess player being imitated must be very stupid to be making such terrible moves), and you have the usual self-reinforcing EDT problem. It will monotonically play the same or worse. (Note that the underlying model may still get better at playing, because it is learning from each game, especially if the actual human is correcting the tic-tac-toe outputs and eg. fixing mistakes in the LLM’s world-model of the game. This could be probed by forcing a switch of simulacra, to keep the world-modeling but shed the hobbled simulacrum: for example, you could edit in a passage saying something like “congratulations, you beat the first level AI! Now prepare for the Tic-Tac-Toe Master to defeat you!”; the more games trained on / in context, the worse the first simulacra but better the second will be.)
For the actual chatbot assistants you are using, it’s more ambiguous. They are ‘motivated’ to perform well, whatever that means in context (according to their internal model of a human rater), and they ‘know’ that they are chatbots, and so a history of errors doesn’t much override their prior about their competence. But you still have issues with learning efficiently from a context window and ‘getting lost in the muddle’ and one still sees the assistant personas ‘getting confused’ and going in circles and eg. proposing the same program whose compile error you already pasted in, so it would depend. Tic-tac-toe is simple enough that I think I would expect a LLM to get better over a decent number of games before probably starting to degrade.