We never did ELO tests, but the 2.7B model trained from scratch on human games in PGN notation beat me and beat my colleague (~1800 ELO). But it would start making mistakes if the game went on very long (we hypothesized it was having difficulties constructing the board state from long PGN contexts), so you could beat it by drawing the game out.
We never did ELO tests, but the 2.7B model trained from scratch on human games in PGN notation beat me and beat my colleague (~1800 ELO). But it would start making mistakes if the game went on very long (we hypothesized it was having difficulties constructing the board state from long PGN contexts), so you could beat it by drawing the game out.