That is odd. I certainly had a much, much higher completion rate than 1 in 40; in fact I had no games that I had to abandon with my prompt. However, I played manually, and played well enough that it mostly did not survive beyond move 30 (although my collection has a blindfold game that went beyond move 50), and checked at every turn that it reproduced the game history correctly, reprompting if that was not the case. Also, for GPT3.5 I supplied it with the narrative fiction that it could access Stockfish. Mentioning Stockfish might push it towards more precise play.
Trying again today, ChatGPT 3.5 using the standard chat interface did however seem to have a propensity to listing only White moves in its PGN output, which is not encouraging.
For exact reproducibility, I have added a game played via the API at temperature zero to my collection and given exact information on model, prompt and temperature in the PGN:
If your scripts allow testing this prompt, I’d be interested in seeing what completion rate/approximate rating relative to some low Stockfish level is achieved by chatgpt-3.5-turbo.
That is odd. I certainly had a much, much higher completion rate than 1 in 40; in fact I had no games that I had to abandon with my prompt. However, I played manually, and played well enough that it mostly did not survive beyond move 30 (although my collection has a blindfold game that went beyond move 50), and checked at every turn that it reproduced the game history correctly, reprompting if that was not the case. Also, for GPT3.5 I supplied it with the narrative fiction that it could access Stockfish. Mentioning Stockfish might push it towards more precise play.
Trying again today, ChatGPT 3.5 using the standard chat interface did however seem to have a propensity to listing only White moves in its PGN output, which is not encouraging.
For exact reproducibility, I have added a game played via the API at temperature zero to my collection and given exact information on model, prompt and temperature in the PGN:
https://lichess.org/study/ymmMxzbj/SyefzR3j
If your scripts allow testing this prompt, I’d be interested in seeing what completion rate/approximate rating relative to some low Stockfish level is achieved by chatgpt-3.5-turbo.