but I recently tried again to see if it could learn at runtime not to lose in the same way multiple times. It couldn’t. I was able to play the same strategy over and over again in the same chat history and win every time.
I wonder if having the losses in the chat history would instead be training/reinforcing it to lose every time.
I wonder if having the losses in the chat history would instead be training/reinforcing it to lose every time.