This is what all that talk about predictive loss was for. Training on predictive loss gets you systems that are especially well-suited to being described as learning the time-evolution dynamics of the training distribution. Not in the sense that they’re simulating the physical reality underlying the training distribution, merely in the sense that they’re learning dynamics for the behavior of the training data.
Sure, you could talk about AlphaZero in terms of prediction. But it’s not going to have the sort of configurability that makes the simulator framing so fruitful in the case of GPT (or in the case of computer simulations of the physical world). You can’t feed AlphaZero the first 20 moves of a game by Magnus Carlsen and have it continue like him.
Or to use a different example, one time talking about simulators is when someone asks “Does GPT know this fact?” because GPT’s dynamics are inhomogeneous—it doesn’t always act with the same quality of knowing the fact or not knowing it. But AlphaZero’s training process is actively trying to get rid of that kind of inhomogeneity—AlphaZero isn’t trained to mimic a training distribution, it’s trained to play high-scoring moves.
The simulator framing has no accuracy advantage over thinking directly in terms of next token prediction, except that thinking in terms of simulator and simulacra sometimes usefully compresses the relevant ideas, and so lets people think larger new thoughts at once. Probably useful for coming up with ChatGPT jailbreaks. Definitely useful for coming up with prompts for base GPT.
This is what all that talk about predictive loss was for. Training on predictive loss gets you systems that are especially well-suited to being described as learning the time-evolution dynamics of the training distribution. Not in the sense that they’re simulating the physical reality underlying the training distribution, merely in the sense that they’re learning dynamics for the behavior of the training data.
Sure, you could talk about AlphaZero in terms of prediction. But it’s not going to have the sort of configurability that makes the simulator framing so fruitful in the case of GPT (or in the case of computer simulations of the physical world). You can’t feed AlphaZero the first 20 moves of a game by Magnus Carlsen and have it continue like him.
Or to use a different example, one time talking about simulators is when someone asks “Does GPT know this fact?” because GPT’s dynamics are inhomogeneous—it doesn’t always act with the same quality of knowing the fact or not knowing it. But AlphaZero’s training process is actively trying to get rid of that kind of inhomogeneity—AlphaZero isn’t trained to mimic a training distribution, it’s trained to play high-scoring moves.
The simulator framing has no accuracy advantage over thinking directly in terms of next token prediction, except that thinking in terms of simulator and simulacra sometimes usefully compresses the relevant ideas, and so lets people think larger new thoughts at once. Probably useful for coming up with ChatGPT jailbreaks. Definitely useful for coming up with prompts for base GPT.