I think point 4 is not very justified. For example, chicken have pretty much hardcoded object permanence, while Sora, being insanely good at video generation, struggles[1] with it.
My hypothesis here is it’s hard to learn object permanence by SGD, but very easy for evolution (you don’t have it, you die and after random search finds it, it spreads as far as possible).
The other example is that, apparently, cognitive specialization in human (and higher animals) brain got so far that neocortex is incapable to learn conditioned response, unlike cerebellum. Moreover, it’s not like neocortex is just unspecialized and cerebellum does this job better, patients with cerebellum atrophy simply don’t have conditioned response, period. I think this puts most analogies between brain and ANNs under very huge doubt.
My personal hypothesis here is that LLMs evolve “backwards” relatively to animals. Animals start as set of pretty simple homeostatic control algorithms and hardcoded sphexish behavioral programs and this sets hard design constraint on development of world-modeling parts of brain—getting flexibility and generality should not disrupt already existing proven neural mechanisms. Speculatively, we can guess that pure world modeling leads to expected known problems like “hallucinations”, so brain evolution is mostly directed by necessity to filter faulty outputs of the world model. For example, non-human animals reportedly don’t have schizophrenia. It looks like a price for untamed overdeveloped predictive model.
I would say that any claims about “nature of effective intelligence” extrapolated from current LLMs are very speculative. What’s true is that something very weird is going on in brain, but we know that already.
Your interpretation of instruction tuning as corrigibility is wrong, it’s anything but. We train neural network to predict text, then we slightly tune its priors towards “if text is instruction its completion is following the instruction”. It’s like if we controlled ants by drawing traces with sugar—yes, ants will go by this traces and eat surfaces marked with sugar and eat their way towards more sugar where we place, but it does not lead to corrigible behavior in superintelligences. I think many-shot jailbreaks are sufficient to carry my point.
(Prompt: A beautiful homemade video showing the people of Lagos, Nigeria in the year 2056. Shot with a mobile phone camera. You can see person disappearing between 3rd and 5th seconds)
(Prompt: A drone camera circles around a beautiful historic church built on a rocky outcropping along the Amalfi Coast… you can see people disappearing at 10th second)
I think point 4 is not very justified. For example, chicken have pretty much hardcoded object permanence, while Sora, being insanely good at video generation, struggles[1] with it.
My hypothesis here is it’s hard to learn object permanence by SGD, but very easy for evolution (you don’t have it, you die and after random search finds it, it spreads as far as possible).
The other example is that, apparently, cognitive specialization in human (and higher animals) brain got so far that neocortex is incapable to learn conditioned response, unlike cerebellum. Moreover, it’s not like neocortex is just unspecialized and cerebellum does this job better, patients with cerebellum atrophy simply don’t have conditioned response, period. I think this puts most analogies between brain and ANNs under very huge doubt.
My personal hypothesis here is that LLMs evolve “backwards” relatively to animals. Animals start as set of pretty simple homeostatic control algorithms and hardcoded sphexish behavioral programs and this sets hard design constraint on development of world-modeling parts of brain—getting flexibility and generality should not disrupt already existing proven neural mechanisms. Speculatively, we can guess that pure world modeling leads to expected known problems like “hallucinations”, so brain evolution is mostly directed by necessity to filter faulty outputs of the world model. For example, non-human animals reportedly don’t have schizophrenia. It looks like a price for untamed overdeveloped predictive model.
I would say that any claims about “nature of effective intelligence” extrapolated from current LLMs are very speculative. What’s true is that something very weird is going on in brain, but we know that already.
Your interpretation of instruction tuning as corrigibility is wrong, it’s anything but. We train neural network to predict text, then we slightly tune its priors towards “if text is instruction its completion is following the instruction”. It’s like if we controlled ants by drawing traces with sugar—yes, ants will go by this traces and eat surfaces marked with sugar and eat their way towards more sugar where we place, but it does not lead to corrigible behavior in superintelligences. I think many-shot jailbreaks are sufficient to carry my point.
(Prompt: A beautiful homemade video showing the people of Lagos, Nigeria in the year 2056. Shot with a mobile phone camera. You can see person disappearing between 3rd and 5th seconds)
(Prompt: A drone camera circles around a beautiful historic church built on a rocky outcropping along the Amalfi Coast… you can see people disappearing at 10th second)
This reminds me of Moravec’s paradox.