Thank you, this has many interesting points. The takeoff question is the heart of predicting x-risk. With soft takeoff catastrophy seems unlikely, and likely with hard takeoff.
One point though. “Foom” was intended to be a synonym for “intelligence explosion” and “hard takeoff”. But not for “recursive self-improvement”, although EY perceived the latter to be the main argument for the former, though not the only one. He wrote:
[Recursive self-improvement] is the biggest, most interesting, hardest-to-analyze, sharpest break-with-the-past contributing to the notion of a “hard takeoff” aka “AI go FOOM”, but it’s nowhere near being the only such factor.
One reason (EY mentions this) AlphaGo Zero was a big update is that it “foomed” (within its narrow domain) without any recursive self-improvement. It used capability amplification by training on synthetic data, self-play in this case.
This is significant because it is actual evidence that a hard takeoff doesn’t require recursive self-improvement. RSI arguably needs a very strong AI to even get off the ground, namely one that is able to do ML research. The base level for capability amplification seems much lower. So the existence of AlphaGo Zero is a direct argument for foom.
As you mention, so far not many systems have successfully used something like capability amplification, but at least it is a proof of concept. It demonstrates possibilities, which wasn’t clear before.
Yes, current LLMs work in the opposite way, relying on massive amounts of human generated text. They essentially perform imitation learning. But it is exactly this that limits their potential. It is not clear how they could ever develop strongly superhuman intelligence by being superhuman at predicting human text. Even a perfect chimpanzee imitator wouldn’t be as intelligent as a human. Just optimizing for imitating human text seems to necessarily lead to diminishing returns. Training on synthetic data doesn’t have this limit of imitation learning.
(Moreover, animals, including humans, probably also do not primarily learn by imitation learning. The most popular current theory, predictive coding, says the brain predicts experiences instead of text. Experiences are not synthetic data, but they aren’t human generated either. Future experiences are directly grounded in and provided by physical reality, while text has only a very indirect connection to the external world. Text is always human mediated. It’s plausible that a superhuman predictive coder would be superhumanly intelligent. It could evaluate complex subjunctive conditionals. Predictive coding could also lead to something like foom: For example via a model which learns with 1000 different remote robot bodies in parallel instead of just with one like animals.)
Being able to perfectly imitate a Chimpanzee would probably also require superhuman intelligence. But such a system would still only be able to imitate chimpanzees. Effectively, it would be much less intelligent than a human. Same for imitating human text. It’s very hard, but the result wouldn’t yield large capabilities.
Do please read the post. Being able to predict human text requires vastly superhuman capabilities, because predicting human text requires predicting the processes that generated said text. And large tracts of text are just reporting on empirical features of the world.
I did read your post. The fact that something like predicting text requires superhuman capabilities of some sort does not mean that the task itself will result in superhuman capabilities. That’s the crucial point.
It is much harder to imitate human text than to write while being a human, but that doesn’t mean the imitated human itself is any more capable than the original.
An analogy. The fact that building fusion power plants is much harder than building fission power plants doesn’t at all mean that the former are better. They could even be worse. There is a fundamental disconnect between the difficulty of a task and the usefulness of that task.
Being able to perfectly imitate a Chimpanzee would probably also require superhuman intelligence. But such a system would still only be able to imitate chimpanzees. Effectively, it would be much less intelligent than a human. Same for imitating human text. It’s very hard, but the result wouldn’t yield large capabilities.
It depends on your ability to extract the information from the model. RLHF and instruction tuning are one such algorithm that allow certain capabaliities besides next-token prediction to be extracted from the model. I suspect many other search and extraction techniques will be found, which can leverage latent capabalities and understandings in the model that aren’t modelled in its’ text outputs.
I aware of just three methods to modify GPTs: In-context learning (prompting), supervised fine-tuning, reinforcement fine-tuning. The achievable effects seem rather similar.
There’s many other ways to search the network in the literature, such as Activation Vectors. And I suspect we’re just getting started on these sorts of search methods.
Thank you, this has many interesting points. The takeoff question is the heart of predicting x-risk. With soft takeoff catastrophy seems unlikely, and likely with hard takeoff.
One point though. “Foom” was intended to be a synonym for “intelligence explosion” and “hard takeoff”. But not for “recursive self-improvement”, although EY perceived the latter to be the main argument for the former, though not the only one. He wrote:
One reason (EY mentions this) AlphaGo Zero was a big update is that it “foomed” (within its narrow domain) without any recursive self-improvement. It used capability amplification by training on synthetic data, self-play in this case.
This is significant because it is actual evidence that a hard takeoff doesn’t require recursive self-improvement. RSI arguably needs a very strong AI to even get off the ground, namely one that is able to do ML research. The base level for capability amplification seems much lower. So the existence of AlphaGo Zero is a direct argument for foom.
As you mention, so far not many systems have successfully used something like capability amplification, but at least it is a proof of concept. It demonstrates possibilities, which wasn’t clear before.
Yes, current LLMs work in the opposite way, relying on massive amounts of human generated text. They essentially perform imitation learning. But it is exactly this that limits their potential. It is not clear how they could ever develop strongly superhuman intelligence by being superhuman at predicting human text. Even a perfect chimpanzee imitator wouldn’t be as intelligent as a human. Just optimizing for imitating human text seems to necessarily lead to diminishing returns. Training on synthetic data doesn’t have this limit of imitation learning.
(Moreover, animals, including humans, probably also do not primarily learn by imitation learning. The most popular current theory, predictive coding, says the brain predicts experiences instead of text. Experiences are not synthetic data, but they aren’t human generated either. Future experiences are directly grounded in and provided by physical reality, while text has only a very indirect connection to the external world. Text is always human mediated. It’s plausible that a superhuman predictive coder would be superhumanly intelligent. It could evaluate complex subjunctive conditionals. Predictive coding could also lead to something like foom: For example via a model which learns with 1000 different remote robot bodies in parallel instead of just with one like animals.)
“The upper bound of what can be learned from a dataset is not the most capable trajectory, but the conditional structure of the universe implicated by their sum”.
Being able to perfectly imitate a Chimpanzee would probably also require superhuman intelligence. But such a system would still only be able to imitate chimpanzees. Effectively, it would be much less intelligent than a human. Same for imitating human text. It’s very hard, but the result wouldn’t yield large capabilities.
Do please read the post. Being able to predict human text requires vastly superhuman capabilities, because predicting human text requires predicting the processes that generated said text. And large tracts of text are just reporting on empirical features of the world.
Alternatively, just read the post I linked.
I did read your post. The fact that something like predicting text requires superhuman capabilities of some sort does not mean that the task itself will result in superhuman capabilities. That’s the crucial point.
It is much harder to imitate human text than to write while being a human, but that doesn’t mean the imitated human itself is any more capable than the original.
An analogy. The fact that building fusion power plants is much harder than building fission power plants doesn’t at all mean that the former are better. They could even be worse. There is a fundamental disconnect between the difficulty of a task and the usefulness of that task.
It depends on your ability to extract the information from the model. RLHF and instruction tuning are one such algorithm that allow certain capabaliities besides next-token prediction to be extracted from the model. I suspect many other search and extraction techniques will be found, which can leverage latent capabalities and understandings in the model that aren’t modelled in its’ text outputs.
This approach doesn’t seem to work with in-context learning. Then it is unclear whether fine-tuning could be more successful.
I think there are probably many approaches that don’t work.
I aware of just three methods to modify GPTs: In-context learning (prompting), supervised fine-tuning, reinforcement fine-tuning. The achievable effects seem rather similar.
There’s many other ways to search the network in the literature, such as Activation Vectors. And I suspect we’re just getting started on these sorts of search methods.