AGI-level reasoner will appear sooner than an agent; what the humanity will do with this reasoner is critical
The recent advancements in language models, Gato, Minerva, etc. suggest to me that there will be AGI-level “predictor”/”reasoner”/generative model sooner than an “agent”. There is some subtlety though w.r.t. Gato, which is an “agent”, but decidedly not an RL agent, it doesn’t intrinsically value exploration, and it doesn’t learn itself/bootstrap itself, and instead does imitation learning from other (actual RL) agents.
I feel that this reasoner may suddenly generalise to astonishing capability, e. g. prove math theorems far beyond human comprehension, derive the theory of everything from data, and advance the state of the art in political science, ethics, and engineering, while remaining a non-agent and thus remaining relatively safe.
I feel that it will be very important then to resist turning such an AI into an agent by attaching actuators to it and giving it tasks. However, this will be hard, because economic incentives for doing so will be enormous.
This shortform is probably a vague expression of the idea, which is much better thought through here: Conditioning Generative Models for Alignment.
Would the AGI reasoner be of significant assistance to the computer programmers who work on improving the reasoner?
It seems plausible to me that part of a general intelligence requires a form of causal learning through active interventions. Indeed, there most causal networks are not recoverable from pure statistical information.
This will make these systems much more like an agent rather than an oracle
Judea Pearl would probably disagree, in “The Book of Why” he explains that the causal effects are recoverable from statistics, and randomised controlled trials are often unnecessary.
Huh.
Judea Pearl is well-known for pointing out that not all causal relations are recoverable from only observable data. See e.g. his seeing, doing, imagining tri-hierarchy
Yes, they are not all recoverable. Per Pearl, researchers should first come up with a scientific hypothesis about the causal model (which variables are causes, which are effects), and then verify or refute it with the help of statistical data. The first step is fundamentally subjective (enter a non-trivial debate and Pearl’s views about the nature of causality, and epistemology more generally). But the second step often doesn’t require collecting new data.
So, an AGI model can contemplate such hypotheses just as well as human researchers. Whether the hypotheses are “right” is the wrong question. The right question is whether they give the power to answer certain questions.
I like your idea that economic incentives will become the safety bottleneck more so than corrigibility. Many would argue that a pure reasoner actually can influence the world through e.g. manipulation, but this doesn’t seem very realistic to me if the model is memoryless and doesn’t have the ability to recursively ask itself new questions.
Adding such capabilities is fairly easy, however. Which is what your concern is about.
I have pointed this out to folks in the context of AI timelines: metaculus gives predictions for “weakly AGI” but I consider hypothetical GATO-x which can generalize to a task outside it’s training distribution or many tasks outside it’s training distribution to be AGI, yet a considerable way from an AGI with enough agency to act on its own.
OTOH it isn’t so much reassurance if bootstrapping this thing up to agency with as little as a batch script to keep it running will make it agentic.
But the time between weak AGI and agentic AGI is a prime learning opportunity and the lesson is we should do everything we can to prolong the length of the time between them once weak AGI is invented.
Also, perhaps someone should study the necessary components for an AGI takeover by simulating agent behavior in a toy model. At the least you need a degree of agency, probably a self model in order to recursively self-improve, and the ability to generalize. Knowing what the necessary components are might enable us to take steps to avoid having them in once system all at once.
If anyone has ever demonstrated, or even systematically described, what those necessary components are, I haven’t seen it done. Maybe it is an infohazard but it also seems like necessary information to coordinate around.
Yes, in this interview, Connor Leahy said he has an idea of what these components are, but he wouldn’t tell publicly.
We already have (very rare) human “reasoners” who can see brilliant opportunities to break free from the status quo, and do new things with existing resources (Picasso, Feynman, Musk, etc.). There must be millions of hidden possibilities to solve our problems that no one has thought of.
This seems almost there, in terms of what you were suggesting?
https://www.sciencealert.com/ai-has-discovered-alternate-physics-on-its-own/amp