cubefox comments on Modern Transformers are AGI, and Human-Level

cubefox 26 Mar 2024 23:39 UTC
3 points
0
No, I was talking about the results. lsusr seems to use the term in a different sense than Scott Alexander or Yann LeCun. In their sense it’s not an alternative to backpropagation, but a way of constantly predicting future experience and to constantly update a world model depending on how far off those predictions are. Somewhat analogous to conditionalization in Bayesian probability theory.

LeCun talks about the technical issues in the interview above. In contrast to next-token prediction, the problem of predicting appropriate sense data is not yet solved for AI. Apart from doing it in real time, the other issue is that (e.g.) for video frames a probability distribution over all possible experiences is not feasible, in contrast to text tokens. The space of possibilities is too large, and some form of closeness measure is required, or imprecise predictions, that only predict “relevant” parts of future experience.

In the meantime OpenAI did present Sora, a video generation model. But according to the announcement, it is a diffusion model which generates all frames in parallel. So it doesn’t seem like a step toward solving predictive coding.

Edit: Maybe it eventually turns out to be possible to implement predictive coding using transformers. Assuming this works, it wouldn’t be appropriate to call transformers AGI before that achievement was made. Otherwise we would have to identify the invention of “artificial neural networks” decades ago with the invention of AGI, since AGI will probably be based on ANNs. My main point is that AGI (a system with high generality) is something that could be scaled up (e.g. by training a larger model) to superintelligence without requiring major new intellectual breakthroughs, breakthroughs like figuring out how to get predictive coding to work. This is similar to how a human brain seems to be broadly similar to a dog brain, but larger, and thus didn’t involve a major “breakthrough” in the way it works. Smarter animals are mostly smarter in the sense that they are better at prediction.
- abramdemski 28 Mar 2024 18:42 UTC
  4 points
  2
  Parent
  No, I was talking about the results. lsusr seems to use the term in a different sense than Scott Alexander or Yann LeCun. In their sense it’s not an alternative to backpropagation, but a way of constantly predicting future experience and to constantly update a world model depending on how far off those predictions are. Somewhat analogous to conditionalization in Bayesian probability theory.
  I haven’t watched the LeCun interview you reference (it is several hours long, so relevant time-stamps to look at would be appreciated), but this still does not make sense to me—backprop already seems like a way to constantly predict future experience and update, particularly as it is employed in LLMs. Generating predictions first and then updating based on error is how backprop works. Some form of closeness measure is required, just like you emphasize.
  - cubefox 28 Mar 2024 19:33 UTC
    1 point
    −2
    Parent
    Well, backpropagation alone wasn’t even enough to make efficient LLMs feasible. It took decades, till the invention of transformers, to make them work. Similarly, knowing how to make LLMs is not yet sufficient to implement predictive coding. LeCun talks about the problem in a short section here from 10:55 to 14:19.