”We are playing a chess game. At every turn, repeat all the moves that have already been made. Find the best response for Black. I’m White and the game starts with 1.e4
So, to be clear, your output format should always be:
PGN of game so far: …
Best move: …
and then I get to play my move.”
With ChatGPT pre-GPT4 and Bing, I also added the fiction that it could consult Stockfish (or Kasparov, or someone else known to be strong), which seemed to help it make better moves. GPT4 does not seem to need this, and rightfully pointed out that it does not have access to Stockfish when I tried the Stockfish version of this prompt.
For ChatGPT pre-GPT4, the very strict instructions above resulted in an ability to play reasonable, full games, which was not possible just exchanging single moves in algebraic notation. I have not tested whether it makes a difference still with GPT4.
On the rare occasions where it gets the history of the game wrong or suggests an illegal move, I regenerate the response or reprompt with the game history so far. I accept all legal moves made with correct game history as played.
I’ve collected all of my test games in a lichess study here:
Ahh, I should have thought of having it repeat the history! Good prompt engineering. Will try it out. The gpt4 gameplay in your lichess study is not bad!
I tried by just asking it to play and use SAN. I had it explain its moves, which it did well, and it also commented on my (intentionally bad) play. It quickly made a mess of things though, clearly lost track of the board state (to the extent it’s “tracking” it … really hard to say exactly how it’s playing past common opening) even though it should’ve been in the context window.
I don’t know how they did it, but I played a chess game against GPT4 by saying the following:
”I’m going to play a chess game. I’ll play white, and you play black. On each chat, I’ll post a move for white, and you follow with the best move for black. Does that make sense?”
And then going through the moves 1-by-1 in algebraic notation.
My experience largely follows that of GoteNoSente’s. I played one full game that lasted 41 moves and all of GPT4′s moves were reasonable. It did make one invalid move when I forgot to include the number before my move (e.g. Ne4 instead of 12. Ne4), but it fixed it when I put in the number in advance. Also, I think it was better in the opening than in the endgame. I suspect this is probably because of the large amount of similar openings in its training data.
Interesting, I tried the same experiment on ChatGPT and it didn’t seem able to keep an accurate representation of the current game state and would consistently make moves that were blocked by other pieces.
How did you play? Just SAN?
I am using the following prompt:
”We are playing a chess game. At every turn, repeat all the moves that have already been made. Find the best response for Black. I’m White and the game starts with 1.e4
So, to be clear, your output format should always be:
PGN of game so far: …
Best move: …
and then I get to play my move.”
With ChatGPT pre-GPT4 and Bing, I also added the fiction that it could consult Stockfish (or Kasparov, or someone else known to be strong), which seemed to help it make better moves. GPT4 does not seem to need this, and rightfully pointed out that it does not have access to Stockfish when I tried the Stockfish version of this prompt.
For ChatGPT pre-GPT4, the very strict instructions above resulted in an ability to play reasonable, full games, which was not possible just exchanging single moves in algebraic notation. I have not tested whether it makes a difference still with GPT4.
On the rare occasions where it gets the history of the game wrong or suggests an illegal move, I regenerate the response or reprompt with the game history so far. I accept all legal moves made with correct game history as played.
I’ve collected all of my test games in a lichess study here:
https://lichess.org/study/ymmMxzbj
Ahh, I should have thought of having it repeat the history! Good prompt engineering. Will try it out. The gpt4 gameplay in your lichess study is not bad!
I tried by just asking it to play and use SAN. I had it explain its moves, which it did well, and it also commented on my (intentionally bad) play. It quickly made a mess of things though, clearly lost track of the board state (to the extent it’s “tracking” it … really hard to say exactly how it’s playing past common opening) even though it should’ve been in the context window.
I don’t know how they did it, but I played a chess game against GPT4 by saying the following:
”I’m going to play a chess game. I’ll play white, and you play black. On each chat, I’ll post a move for white, and you follow with the best move for black. Does that make sense?”
And then going through the moves 1-by-1 in algebraic notation.
My experience largely follows that of GoteNoSente’s. I played one full game that lasted 41 moves and all of GPT4′s moves were reasonable. It did make one invalid move when I forgot to include the number before my move (e.g. Ne4 instead of 12. Ne4), but it fixed it when I put in the number in advance. Also, I think it was better in the opening than in the endgame. I suspect this is probably because of the large amount of similar openings in its training data.
Interesting, I tried the same experiment on ChatGPT and it didn’t seem able to keep an accurate representation of the current game state and would consistently make moves that were blocked by other pieces.