This seems more plausible post hoc. There should be plenty of transcripts of random algorithms as baseline versus effective chess algorithms in the training set, and the prompt suggests strong play.
There should be plenty of transcripts of random algorithms as baseline versus effective chess algorithms in the training set
I wouldn’t think that. I’m not sure I’ve seen a random-play transcript of chess in my life. (I wonder how long those games would have to be for random moves to end in checkmate?)
the prompt suggests strong play.
Which, unlike random move transcripts, is what you would predict, since the Superalignment paper says the GPT chess PGN dataset was filtered for Elo (“only games with players of Elo 1800 or higher were included in pretraining”), in standard behavior-cloning fashion.
This sort of thing probably would have been scraped?
I was thinking that plenty would appear as the only baseline a teenage amateur RL enthusiast might beat before getting bored, but I haven’t found any examples of anyone actually posting such transcripts after a few minutes of effort so maybe you’re right.
Which, unlike random move transcripts, is what you would predict, since the Superalignment paper says the GPT chess PGN dataset was filtered for Elo, in standard behavior-cloning fashion.
Chess-specific training sets won’t contain a lot of random play.
I am more interested in any direct evidence that makes you suspect LLMs are good at chess when prompted appropriately?
A human player beating a random player isn’t two random players.
I am more interested in any direct evidence that makes you suspect LLMs are good at chess when prompted appropriately?
Well, there’s the DM bullet-chess GPT as a drastic proof of concept. If you believe that LLMs cannot learn to play chess, you have to explain how things like that work.
A random player against a good player is exactly what we’re looking for right? If all transcripts with one random player had two random players then LLMs should play randomly when their opponents play randomly, but if most transcripts with a random player have it getting stomped by a superior algorithm that’s what we’d expect from base models (and we should be able to elicit it more reliably with careful prompting).
I see no reason that transformers can’t learn to play chess (or any other reasonable game) if they’re carefully trained on board state evaluations etc. This is essentially policy distillation (from a glance at the abstract). What I’m interested in is whether LLMs have absorbed enough general reasoning ability that they can learn to play chess the hard way, like humans do—by understanding the rules and thinking it through zero-shot. Or at least transfer some of that generality to performing better at chess than would be expected (since they in fact have the advantage of absorbing many games during training and don’t have to learn entirely in context). I’m trying to get at that question by investigating how LLMs do at chess—the performance of custom trained transformers isn’t exactly a crux, though it is somewhat interesting.
Yes.
This seems more plausible post hoc. There should be plenty of transcripts of random algorithms as baseline versus effective chess algorithms in the training set, and the prompt suggests strong play.
I wouldn’t think that. I’m not sure I’ve seen a random-play transcript of chess in my life. (I wonder how long those games would have to be for random moves to end in checkmate?)
Which, unlike random move transcripts, is what you would predict, since the Superalignment paper says the GPT chess PGN dataset was filtered for Elo (“only games with players of Elo 1800 or higher were included in pretraining”), in standard behavior-cloning fashion.
I don’t know, I almost instantly found a transcript of a human stomping a random agent on reddit:
https://www.reddit.com/r/chess/comments/2rv7fr/randomness_vs_strategy/
This sort of thing probably would have been scraped?
I was thinking that plenty would appear as the only baseline a teenage amateur RL enthusiast might beat before getting bored, but I haven’t found any examples of anyone actually posting such transcripts after a few minutes of effort so maybe you’re right.
Chess-specific training sets won’t contain a lot of random play.
I am more interested in any direct evidence that makes you suspect LLMs are good at chess when prompted appropriately?
A human player beating a random player isn’t two random players.
Well, there’s the DM bullet-chess GPT as a drastic proof of concept. If you believe that LLMs cannot learn to play chess, you have to explain how things like that work.
A random player against a good player is exactly what we’re looking for right? If all transcripts with one random player had two random players then LLMs should play randomly when their opponents play randomly, but if most transcripts with a random player have it getting stomped by a superior algorithm that’s what we’d expect from base models (and we should be able to elicit it more reliably with careful prompting).
I see no reason that transformers can’t learn to play chess (or any other reasonable game) if they’re carefully trained on board state evaluations etc. This is essentially policy distillation (from a glance at the abstract). What I’m interested in is whether LLMs have absorbed enough general reasoning ability that they can learn to play chess the hard way, like humans do—by understanding the rules and thinking it through zero-shot. Or at least transfer some of that generality to performing better at chess than would be expected (since they in fact have the advantage of absorbing many games during training and don’t have to learn entirely in context). I’m trying to get at that question by investigating how LLMs do at chess—the performance of custom trained transformers isn’t exactly a crux, though it is somewhat interesting.