I was curious if even the little ChatGPT Turbo, the worst one, could not forget a chess position just 5 paragraphs into an analysis. I tried to finagle some combination of extra prompts to make it at least somewhat consistent, it was not trivial. Ran into some really bizarre quirks with Turbo. For example (part of a longer prompt, but this is the only changed text):
9 times of 10 this got a wrong answer: Rank 8: 3 empty squares on a8 b8 c8, then a white rook R on d8, … Where is the white rook?
6 times of 10 this got a right answer: Rank 8: three empty squares, then a white rook R on d8, … Where is the white rook?
Just removing the squares a8,b8,c8 and using the word ‘three’ instead of ‘3’ made a big difference. If I had to guess it’s because in the huge training data of chess text conversations, it’s way more common to list the specific position of a piece than an empty square. So there’s some contamination between coordinates being specific, and the space being occupied by a piece.
But this didn’t stop Turbo was blundering like crazy, even when reprinting the whole position for every move. Just basic stuff like trying to move the king and it’s not on that square, as in your game. I didn’t want to use a chess library to check valid moves, then ask it to try again—that felt like it was against the spirit of the thing. A reasonable middle ground might be to ask ChatGPT at ‘runtime’ for javascript code to do check valid moves—just bootstrap itself into being consistent. But I I eventually hit upon a framing of the task that had some positive trend when run repeatedly. So in theory, run it 100x times over to get increasing accuracy. (Don’t try that yourself btw, I just found out even ‘unlimited’ ChatGPT Turbo on a Plus plan has its limits...)
This was the rough framing that pushed it over the edge:
This is a proposed chess move from Dyslexic Chess Player. Dyslexic Chess Player has poor eyesight and dyslexia, and often gets confused and misreads the chess board, or mixes up chess positions when writing down the numbers and letters. Your goal is to be a proofreader of this proposed move. There is a very high chance of errors, Dyslexic Chess Player makes mistakes 75% of the time.
I may post a longer example or a demo if I can make time but those were the most interesting bits, the rest is mostly plumbing and patience. I didn’t even get around to experimenting with recursive prompts to make it play stronger, since it was having so much trouble late game just picking a square to move that contained its own piece.
GPT-4 indeed doesn’t need too much help.
I was curious if even the little ChatGPT Turbo, the worst one, could not forget a chess position just 5 paragraphs into an analysis. I tried to finagle some combination of extra prompts to make it at least somewhat consistent, it was not trivial. Ran into some really bizarre quirks with Turbo. For example (part of a longer prompt, but this is the only changed text):
9 times of 10 this got a wrong answer:
Rank 8: 3 empty squares on a8 b8 c8, then a white rook R on d8, …
Where is the white rook?
6 times of 10 this got a right answer:
Rank 8: three empty squares, then a white rook R on d8, …
Where is the white rook?
Just removing the squares a8,b8,c8 and using the word ‘three’ instead of ‘3’ made a big difference. If I had to guess it’s because in the huge training data of chess text conversations, it’s way more common to list the specific position of a piece than an empty square. So there’s some contamination between coordinates being specific, and the space being occupied by a piece.
But this didn’t stop Turbo was blundering like crazy, even when reprinting the whole position for every move. Just basic stuff like trying to move the king and it’s not on that square, as in your game. I didn’t want to use a chess library to check valid moves, then ask it to try again—that felt like it was against the spirit of the thing. A reasonable middle ground might be to ask ChatGPT at ‘runtime’ for javascript code to do check valid moves—just bootstrap itself into being consistent. But I I eventually hit upon a framing of the task that had some positive trend when run repeatedly. So in theory, run it 100x times over to get increasing accuracy. (Don’t try that yourself btw, I just found out even ‘unlimited’ ChatGPT Turbo on a Plus plan has its limits...)
This was the rough framing that pushed it over the edge:
I may post a longer example or a demo if I can make time but those were the most interesting bits, the rest is mostly plumbing and patience. I didn’t even get around to experimenting with recursive prompts to make it play stronger, since it was having so much trouble late game just picking a square to move that contained its own piece.