This happened with a 2.7B GPT I trained from scratch on PGN chess games. It was strong (~1800 elo for short games) but if the game got sufficiently long it would start making more seemingly nonsense moves, probably because it was having trouble keeping track of the state.
Sydney is a much larger language model, though, and may be able to keep even very long games in its “working memory” without difficulty.
This happened with a 2.7B GPT I trained from scratch on PGN chess games. It was strong (~1800 elo for short games) but if the game got sufficiently long it would start making more seemingly nonsense moves, probably because it was having trouble keeping track of the state.
Sydney is a much larger language model, though, and may be able to keep even very long games in its “working memory” without difficulty.