I’m not sure how you intend your predictive-coding point to be understood, but from my perspective, it seems like a complaint about the underlying tech rather than the results, which seems out of place. If backprop can do the job, then who cares? I would be interested to know if you can name something which predictive coding has currently accomplished, and which you believe to be fundamentally unobtainable for backprop. lsusr thinks the two have been unified into one theory.
I don’t buy that animals somehow plug into “base reality” by predicting sensory experiences, while transformers somehow miss out on it by predicting text and images and video. Reality has lots of parts. Animals and transformers both plug into some limited subset of it.
I would guess raw transformers could handle some real-time robotics tasks if scaled up sufficiently, but I do agree that raw transformers would be missing something important architecture-wise. However, I also think it is plausible that only a little bit more architecture is needed (and, that the ‘little bit more’ corresponds to things people have already been thinking about) -- things such as the features added in the generative agents paper. (I realize, of course, that this paper is far from realtime robotics.)
No, I was talking about the results. lsusr seems to use the term in a different sense than Scott Alexander or Yann LeCun. In their sense it’s not an alternative to backpropagation, but a way of constantly predicting future experience and to constantly update a world model depending on how far off those predictions are. Somewhat analogous to conditionalization in Bayesian probability theory.
LeCun talks about the technical issues in the interview above. In contrast to next-token prediction, the problem of predicting appropriate sense data is not yet solved for AI. Apart from doing it in real time, the other issue is that (e.g.) for video frames a probability distribution over all possible experiences is not feasible, in contrast to text tokens. The space of possibilities is too large, and some form of closeness measure is required, or imprecise predictions, that only predict “relevant” parts of future experience.
In the meantime OpenAI did present Sora, a video generation model. But according to the announcement, it is a diffusion model which generates all frames in parallel. So it doesn’t seem like a step toward solving predictive coding.
Edit: Maybe it eventually turns out to be possible to implement predictive coding using transformers. Assuming this works, it wouldn’t be appropriate to call transformers AGI before that achievement was made. Otherwise we would have to identify the invention of “artificial neural networks” decades ago with the invention of AGI, since AGI will probably be based on ANNs. My main point is that AGI (a system with high generality) is something that could be scaled up (e.g. by training a larger model) to superintelligence without requiring major new intellectual breakthroughs, breakthroughs like figuring out how to get predictive coding to work. This is similar to how a human brain seems to be broadly similar to a dog brain, but larger, and thus didn’t involve a major “breakthrough” in the way it works. Smarter animals are mostly smarter in the sense that they are better at prediction.
No, I was talking about the results. lsusr seems to use the term in a different sense than Scott Alexander or Yann LeCun. In their sense it’s not an alternative to backpropagation, but a way of constantly predicting future experience and to constantly update a world model depending on how far off those predictions are. Somewhat analogous to conditionalization in Bayesian probability theory.
I haven’t watched the LeCun interview you reference (it is several hours long, so relevant time-stamps to look at would be appreciated), but this still does not make sense to me—backprop already seems like a way to constantly predict future experience and update, particularly as it is employed in LLMs. Generating predictions first and then updating based on error is how backprop works. Some form of closeness measure is required, just like you emphasize.
Well, backpropagation alone wasn’t even enough to make efficient LLMs feasible. It took decades, till the invention of transformers, to make them work. Similarly, knowing how to make LLMs is not yet sufficient to implement predictive coding. LeCun talks about the problem in a short section here from 10:55 to 14:19.
I’m not sure how you intend your predictive-coding point to be understood, but from my perspective, it seems like a complaint about the underlying tech rather than the results, which seems out of place. If backprop can do the job, then who cares? I would be interested to know if you can name something which predictive coding has currently accomplished, and which you believe to be fundamentally unobtainable for backprop. lsusr thinks the two have been unified into one theory.
I don’t buy that animals somehow plug into “base reality” by predicting sensory experiences, while transformers somehow miss out on it by predicting text and images and video. Reality has lots of parts. Animals and transformers both plug into some limited subset of it.
I would guess raw transformers could handle some real-time robotics tasks if scaled up sufficiently, but I do agree that raw transformers would be missing something important architecture-wise. However, I also think it is plausible that only a little bit more architecture is needed (and, that the ‘little bit more’ corresponds to things people have already been thinking about) -- things such as the features added in the generative agents paper. (I realize, of course, that this paper is far from realtime robotics.)
Anyway, high uncertainty on all of this.
No, I was talking about the results. lsusr seems to use the term in a different sense than Scott Alexander or Yann LeCun. In their sense it’s not an alternative to backpropagation, but a way of constantly predicting future experience and to constantly update a world model depending on how far off those predictions are. Somewhat analogous to conditionalization in Bayesian probability theory.
LeCun talks about the technical issues in the interview above. In contrast to next-token prediction, the problem of predicting appropriate sense data is not yet solved for AI. Apart from doing it in real time, the other issue is that (e.g.) for video frames a probability distribution over all possible experiences is not feasible, in contrast to text tokens. The space of possibilities is too large, and some form of closeness measure is required, or imprecise predictions, that only predict “relevant” parts of future experience.
In the meantime OpenAI did present Sora, a video generation model. But according to the announcement, it is a diffusion model which generates all frames in parallel. So it doesn’t seem like a step toward solving predictive coding.
Edit: Maybe it eventually turns out to be possible to implement predictive coding using transformers. Assuming this works, it wouldn’t be appropriate to call transformers AGI before that achievement was made. Otherwise we would have to identify the invention of “artificial neural networks” decades ago with the invention of AGI, since AGI will probably be based on ANNs. My main point is that AGI (a system with high generality) is something that could be scaled up (e.g. by training a larger model) to superintelligence without requiring major new intellectual breakthroughs, breakthroughs like figuring out how to get predictive coding to work. This is similar to how a human brain seems to be broadly similar to a dog brain, but larger, and thus didn’t involve a major “breakthrough” in the way it works. Smarter animals are mostly smarter in the sense that they are better at prediction.
I haven’t watched the LeCun interview you reference (it is several hours long, so relevant time-stamps to look at would be appreciated), but this still does not make sense to me—backprop already seems like a way to constantly predict future experience and update, particularly as it is employed in LLMs. Generating predictions first and then updating based on error is how backprop works. Some form of closeness measure is required, just like you emphasize.
Well, backpropagation alone wasn’t even enough to make efficient LLMs feasible. It took decades, till the invention of transformers, to make them work. Similarly, knowing how to make LLMs is not yet sufficient to implement predictive coding. LeCun talks about the problem in a short section here from 10:55 to 14:19.