I haven’t seen any work trying to make this comparison in a rigorous way. I’m referring to the common opinion, which I share, that LLMs are the current high-water mark of AI (and AGI) research. Most of the best ones use both predictive learning for the majority of training, and RL for fine-tuning; but there’s some work indicating that it’s just as effective to do that fine-tuning with more predictive learning on a selected dataset (either hand-selected by humans for preferred responses similar to RLHF, or produced by an LLM with some criteria, similar to constitutional AI).
Prior to this, AlphaZero and its family of RL algorithms were commonly considered the high water mark.
Each has strengths and weaknesses, so I and others suspect that a combination of the two may continue to be the most effective approach.
The core intuition for why predictive learning would be overall more powerful is that there’s more to learn from. Predictive learning has a large vector, RL is a scalar signal. And prediction doesn’t need any labeling of the data. If I have a stream of data, I can predict what happens next, and have a large vector signal. Even if I only have a collection of static data, I can block chunks of it and have the system predict what occurs in those chunks (like current vision model training does). RL relies on having a marker of what is good or bad in the data; that has to be added either by hand or by an algorithm (like a score in a game environment or energy measure for protein folding). Powerful critic systems can extrapolate limited reward information, but that still gives at most a single scalar signal for each set of input data, where prediction is learning from the full large vector signal for each input.
There’s more than you asked for; the sad answer is no, I don’t have a good reference.
I haven’t seen any work trying to make this comparison in a rigorous way. I’m referring to the common opinion, which I share, that LLMs are the current high-water mark of AI (and AGI) research. Most of the best ones use both predictive learning for the majority of training, and RL for fine-tuning; but there’s some work indicating that it’s just as effective to do that fine-tuning with more predictive learning on a selected dataset (either hand-selected by humans for preferred responses similar to RLHF, or produced by an LLM with some criteria, similar to constitutional AI).
Prior to this, AlphaZero and its family of RL algorithms were commonly considered the high water mark.
Each has strengths and weaknesses, so I and others suspect that a combination of the two may continue to be the most effective approach.
The core intuition for why predictive learning would be overall more powerful is that there’s more to learn from. Predictive learning has a large vector, RL is a scalar signal. And prediction doesn’t need any labeling of the data. If I have a stream of data, I can predict what happens next, and have a large vector signal. Even if I only have a collection of static data, I can block chunks of it and have the system predict what occurs in those chunks (like current vision model training does). RL relies on having a marker of what is good or bad in the data; that has to be added either by hand or by an algorithm (like a score in a game environment or energy measure for protein folding). Powerful critic systems can extrapolate limited reward information, but that still gives at most a single scalar signal for each set of input data, where prediction is learning from the full large vector signal for each input.
There’s more than you asked for; the sad answer is no, I don’t have a good reference.