It plays at around 1000 Elo, and can make consistently legal moves until about 20-30 moves in, when its performance tends to break down.
Do we know if this happens because of running out of context window and beginning to lose the first prompts?
If not, it could be that the computation of the board state from just the score is becoming too onerous (namely, playing “blindfolded” has gotten too hard). In that case, I wonder if replacing the score with a representation of the board with Forsyth-Edwards Notation as I suggested in my other comment would improve things.
I don’t think it’s a question of the context window—the same thing happens if you just start anew with the original “magic prompt” and the whole current score. And the current score is alone is short, at most ~100 tokens—easily enough to fit in the context window of even a much smaller model.
In my experience, also, FEN doesn’t tend to help—see my other comment.
Good question but no—ChatGPT still makes occasional mistakes even when you use the GPT API, in which you have full visibility/control over the context window.
Do we know if this happens because of running out of context window and beginning to lose the first prompts?
If not, it could be that the computation of the board state from just the score is becoming too onerous (namely, playing “blindfolded” has gotten too hard). In that case, I wonder if replacing the score with a representation of the board with Forsyth-Edwards Notation as I suggested in my other comment would improve things.
I don’t think it’s a question of the context window—the same thing happens if you just start anew with the original “magic prompt” and the whole current score. And the current score is alone is short, at most ~100 tokens—easily enough to fit in the context window of even a much smaller model.
In my experience, also, FEN doesn’t tend to help—see my other comment.
Good question but no—ChatGPT still makes occasional mistakes even when you use the GPT API, in which you have full visibility/control over the context window.