“Real AGI”
I see “AGI” used for everything from existing LLMs to superintelligence, and massive resulting confusion and illusory disagreements. I finally thought of a term I like for what I mean by AGI. It’s an acronym that’s also somewhat intuitive without reading the definition:
Reasoning, Reflective Entities with Autonomy and Learning
might be called “(R)REAL AGI” or “real AGI”. See below for further definitions.
Hoping for AI to remain hobbled by being cognitively incomplete looks like wishful thinking to me. Nor can we be sure that these improvements and the resulting dangers won’t happen soon.
I think there are good reasons to expect that we get such “real AGI” very soon after we have useful AI. After 2+ decades of studying how the brain performs complex cognition, I’m pretty sure that our cognitive abilities are the result of multiple brain subsystems and cognitive capacities working synergistically. A similar approach is likely to advance AI.
Adding these other cognitive capacities creates language model agents/cognitive architectures (LMCAs). Adding each of these seems relatively easy (compared to developing language models) and almost guaranteed to add useful (but dangerous) capabilities.
More on this and expanded arguments and definitions in an upcoming post; this is primarily a reference for this definition of AGI.
Reasoning
Deliberative “System 2” reasoning allows humans to trade off “thinking time” for accuracy.
Aggregating cognition for better results can be added to nearly any system in fairly straightforward ways.
Reflective
Can think about their own cognition.
Useful for organizing and improving cognition.
Reflective stability of values/goals has important upsides and downsides for alignment
Entities
Evokes the intuition of a whole mind, rather than piece of a mind or a cognitive tool
with Autonomy
Acts independently without close direction.
Very useful for getting things done efficiently
Has agency in that they take actions, and have explicit goals they pursue with flexible strategies
Including deriving and pursuing explicit, novel subgoals
This is highly useful for factoring novel complex tasks
One useful subgoal is “make sure nobody prevents me from working on my main goal.” ;)
and Learning
Continuous learning from ongoing experience
Humans have at least four types; LMCAs currently have one and a fraction.[1]
These are all straightforwardly implementable for LMCA agents and probably for other potential network AGI designs.
AGI- Artificial (FULLY) General Intelligence
All of the above are (arguably) implied by fully general intelligence:
Humans can think about anything with some success.
That includes thinking about their own cognition (Reflection) which enables Reasoning by allowing strategic aggregation of cognitive steps.
It requires online Learning to think about topics not in the previous training set.
It almost requires goal-directed Autonomy to gather useful new information and arguably, to take “mental actions” that travel conceptual space strategically.
Those together imply an Entity that is functionally coherent and goal-directed.
You could drop one of the Rs or aggregate them if you wanted a nicer acronym.
The above capacities are often synergistic, in that having each makes others work better. For instance, a “real AGI” can Learn important results of its time-consuming Reasoning, and can Reason more efficiently using Learned strategies. The different types of Learning are synergistic with each other, etc. More on some potential synergies in Capabilities and alignment of LLM cognitive architectures; the logic applies to other multi-capacity AI systems as well.
I like two other possible terms for the same definition of AGI: “full AGI” for artificial fully general intelligence; or “parahuman AGI” to imply having all the same cognitive capacities as humans, and working with humans.
This definition is highly similar to Steve Byrnes’ in “Artificial General Intelligence”: an extremely brief FAQ, although his explanation is different enough to be complementary. It does not specify all of the same cognitive abilities, and provides different intuition pumps. Something like this conception of advanced AI appears to be common in most treatments of aligning superintelligence, but not in prosaic alignment work.
More on the definition and arguments for the inclusion of each of those cognitive capacities will be included in a future post, linked here when it’s done. I wanted to get this out and have a succinct definition. Questions and critiques of the definitions and claims here will make that a better post.
All feedback is welcome. If anyone’s got better terms, I’d love to adopt them.
Edit: title changed in a fit of indecision. Quotation marks used to emphasize that it’s a definition, not a claim about what AGI “really” is.
- ^
Types of continuous learning in humans (and language model cognitive architecture (LMCA) equivalents):
Working memory
(LMCAs have the context window)
Semantic memory/habit learning
(Model weight updates from experience)
Episodic memory for important snapshots of cognition
(Vector-based text memory is this but poorly implemented)
Dopamine-based RL using a powerful critic
(self-supervised RLAIF and/or RLHF during deployment)
- Current Attitudes Toward AI Provide Little Data Relevant to Attitudes Toward AGI by 12 Nov 2024 18:23 UTC; 16 points) (
- 11 Nov 2024 18:45 UTC; 11 points) 's comment on Linkpost: “Imagining and building wise machines: The centrality of AI metacognition” by Johnson, Karimi, Bengio, et al. by (
- 19 Nov 2024 1:31 UTC; 6 points) 's comment on Training AI agents to solve hard problems could lead to Scheming by (
Why do you think integrating all these abilities is easier than building an LLM? My intuition is that it is much harder, though perhaps most necessary progress for both is contained in scaling compute and most of that scaling has probably already taken place.
Basically because the strengths of LLMs can be leveraged to do most of the work in each case.
OpenAI’s o1 shows how straightforward reasoning was after reaching GPT4 etc levels of LLMs.
Reflection is merely allowing the system to apply its intelligence to its own cognition; it’s questionable whether that’s worth categorizing as a different ability at all.
Autonomy already exists, and existing LLMs do effectively choose subgoals; they need to get better at it to be useful or dangerous; but they are.
I gavve the rough scheme for all of the types of learning except episodic memory; I won’t say more because I don’t want to improve capabilities. I wouldn’t even say this much if I didn’t strongly suspect that language model agents/cognitive architectures are our best shot at aligned AGI. Suffice it to say that I expect people to be working on the improvement to episodic memory I’m thinking of now and to release it soon.
I happen to have a little inside information that one of the other types of learning is working very well in at least one project.
If you don’t believe me, I ask you: can you be sure they won’t? Shouldn’t we have some plans together just in case I’m right?
I’m not sure they won’t turn out to be easy relative to inventing LLM’s, but under my model of cognition there’s a lot of work remaining. Certainly we should plan for the case that you’re right, though that is probably an unwinnable situation so it may not matter.
The chances of this conversation advancing capabilities are probably negligible—there are thousands of engineers pursuing the plausible sounding approaches. But if you have a particularly specific or obviously novel idea I respect keeping it to yourself.
Let’s revisit the o1 example after people have had some time to play with it. Currently I don’t think there’s much worth updating strongly on.
You don’t think a 70% reduction in error on problem solving is a major advance? Let’s see how it plays out. I don’t think this will quite get us to RREAL AGI, but it’s going to be close.
I couldn’t disagree more with your comment that this is an unwinnable scenario if I’m right. It might be our best chance. I’m really worried that many people share the sentiment you’re expressing, and that’s why they’re not interested in considering this scenario closely. I have yet to find any decent arguments for why this scenario isn’t quite possible. It’s probably the single likeliest concrete AGI scenario we might predict now. It makes sense to me to spend some real effort on the biggest possibility we can see relatively clearly.
It’s far from unwinnable. We have promising alignment plans with low taxes. Instruction-following AGI is easier and more likely than value aligned AGI, and the easier part is really good news. There’s still a valid question of If we solve alignment, do we die anyway?, but I think the answer is probably that we don’t- it becomes a political issue, but it is a solvable one.
More on your intuition that integrating other systems will be really hard in the other threads here.
Okay, I notice these 2 things are in tension of each other:
If you think they’re already planning to do this, why do you think that keeping the ideas to yourself would meaningfully advance capabilities by more than a few months?
More generally, I think that there’s a conflict between thinking that other people will do what you fear soon, and not saying what you think they are doing, because it means that if your idea worked, people would try to implement it anyways and you only saved yourself a few months time, which isn’t usually enough for interventions.
I think you’re right that there’s an implied tension here.
Because my biggest fear is proliferation, as we’ve been discussing in If we solve alignment, do we die anyway?. The more AGIs we get coming online around the same time, the worse off we are. My hope after the discussion in that post is that whoever is ahead in the AGI race realizes that wide proliferation of fully RSI capable AGI is certain doom, and uses their AGI for a soft pivotal act- with many apologies and assurances to the rest of the world that it will be used for the betterment of all. They’ll then have to do that, to some degree at least, so the results could be really good.
As for only a few months of time bought, I agree. But it would tighten the race and so create more proliferation. As for interventions, I don’t actually think any interventions that have been discussed or will become viable would actually help. I could be wrong on this easily; it’s not my focus.
I see a path to human survival and flourishing that needs no interventions and no work, except that it would be helpful to make sure the existing schemes for aligning language model cognitive architectures (and similar proto-AGIs) to personal intent are actually going to work. They look to me like they will, but I haven’t gotten enough people to actually consider them in detail.
I think a crux here is that even conditional on your ideas working IRL, I think that in practice, there will only be 3-4 AGIs by the big labs, and plausibly limited to Deepmind, OpenAI and Anthropic, because of the power and energy cost of training future AIs will become a huge bottleneck by 2030.
Another way to say it is I think open-source will just totally lose the race to AGI, so I only consider the actors of Meta, Deepmind, Anthropic and OpenAI as relevant actors for several years, so these are the only actors that could actually execute on your ideas even if they proliferated.
This part is really key to emphasize. I see lots of people making arguments around things like, “In my project we will keep it safe by carefully developing necessary components A and B, but not C or D.” And they don’t talk about the other groups working on B and C, or on A and D.
There are so many pieces out there covering so many of the seemingly required aspects of such a fully competent (R)REAL AGI. As engineers who are familiar with the concept of ‘Integration Hell’ will assure you, it’s often not trivial to hook up many different components into a functioning whole. However, it’s also generally not impossible, if the components work together in theory. It’s a predictably solvable challenge.
What’s more, the closer you get to the (R)REAL AGI, the more your work gets accelerated / made easier by the assistance from the thing you are building. This is why the idea of ‘get close, but then stop short of danger’ seems like a risky bet. At that point it will be quite easy for a wide set of people to use the system itself to go the remainder of the distance.
Good point on different teams simultaneously developing different cognitive capabilities. Further efforts will also copy the most successful techniques, and combine them in a single LMC architecture. The synergies might make the rate of progress surprising at some point. Hopefully not an important point...
I agree that the claims about rapidity and inevitability of real AGI is the important part.
But that’s trickier, so it’s not the point of this post.
One thing I feel stuck on is that part of why I’m so convinced this stuff happens pretty rapidly is that I see pretty direct routes to each. They’re not that clever, but they’re not totally obvious.
Beyond sharing good ideas, just spreading the belief that this stuff is valuable and achievable will speed up progress.
Any insight on either of these would be appreciated. But I will write another post posing those questions.
Hooking up ai subsystems is predictably harder than you’re implying. Humans are terrible at building agi, the only thing we get to work is optimization under minimal structural assumptions. The connections between subsystems will have to be learned not hardcoded, and that will be a bottleneck—very possibly somehow unified system trained in a somewhat clever way will get there first.
You really think humans are terrible at building AGI after the sudden success of LLMs? I think success builds on success, and (neural net-based) intelligence is turning out to be actually a lot easier than we thought.
I have been involved in two major project of hooking up different components of cognitive architectures. It was a nightmare, as you say. Yet there are already rapid advances in hooking up LLMs to different systems in different roles, for the reasons Nathan gives: their general intelligence makes them better at controlling other subsystems and taking in information from them.
Perhaps I should qualify what I mean by “easy”. Five years is well within my timeline. That’s not a lot of time to work on alignment. And less than five years for scary capabilities is also quite possible. It could be longer, which would be great- but shouldn’t at least a significant subset of us be working on the shortest realistic timeline scenarios? Giving up on them makes no sense.
I’m not convinced that LLM agents are useful for anything.
Me either!
I’m convinced that they will be useful for a lot of things. Progress happens.
Eventually—but agency is not sequence prediction + a few hacks. The remaining problems are hard. Massive compute, investment, and enthusiasm will lead to faster progress—i objected to 5 year timelines after chatgpt, but now it’s been a couple years. I think 5 years is still too soon but I’m not sure.
Edit: After Nathan offered to bet my claim is false, I bet no on his market at 82% claiming (roughly) that inference compute is as valuable as training computer for GPT-5: https://manifold.markets/NathanHelmBurger/gpt5-plus-scaffolding-and-inference. I expect this will be difficult to resolve because o1 is the closest we will get to a GPT-5 and it presumably benefits from both more training (including RLHF) and more inference compute. I think its perfectly possible that well thought out reinforcement learning can be as valuable as pretraining, but for practical purposes I expect scaling inference compute on a base model will not see qualitative improvements. I will reach out about more closely related bets.
For dumb subsystems, yes. But the picture changes when one of the subsystems is general intelligence. Putting an LLM in charge of controlling a robot seems like it should be hard, since robotics is always hard… and yet, there’s been a rash of successes with this recently as LLMs have gotten just-barely-general-enough to do a decent job of this.
So my prediction is that as we make smarter and more generally capable models, a lot of the other specific barriers (such as embodiment, or emulated keyboard/mouse use) fall away faster than you’d predict from past trends.
So then the question is, how much difficulty will there be in hooking up the subsystems of the general intelligence module: memory, recursive reasoning, multi-modal sensory input handling, etc. A couple years ago I was arguing with people that the jump from language-only to multi-modal would be quick, and also that soon after one group did it that many others would follow suit and it would become a new standard. This was met with skepticism at the time, people argued it would take longer and be more difficult than I was predicting and that we should expect the change to happen further out into the future (e.g. > 5 years) and occur gradually. Now vision+language is common in the frontier models.
So yeah, it’s hard to do such things, but like.… it’s a challenge which I expect teams of brilliant engineers with big research budgets to be able to conquer. Not hard like I expect them to try their best, but fail and be completely blocked for many years, leading to a general halt of progress across all existing teams.
For what it’s worth, though I can’t point to specific predictions I was not at all surprised by multi-modality. It’s still a token prediction problem, there are not fundamental theoretical differences. I think that modestly more insights are necessary for these other problems.
This made me think about how this will come about, whether we we have multiple discrete systems for different functions; language, image recognition, physical balance, executive functions etc working interdependently, communicating through compressed-bandwidth conduits, or whether at some point we can/need-to literally chuck all the raw data from all the systems into one learning system, and let that sort it out (likely creating its own virtual semi-independent systems).
Right. That remains to be seen. Efforts are progressing in both directions, and either one could work.
I think it’s a useful & on-first-read coherent definition; my main suggestion is to not accompany it with ‘AGI’; we already have too many competing
standardsdefinitions for AGI for another one to be useful. Holden Karnofsky’s choice to move away from that with ‘Transformative AI’ made it much easier to talk about clearly.What are the four types? I can imagine breaking that up along different axes.
Here is some rough draft of an expanded version that would try to justify the claims more rather than just stating the definition and the claims.
Here are two starts at detailing those types of memory:
and Learning (alternate draft)
Meaning continuous learning from ongoing experience
Humans have at least four types:
Episodic memory for specific snapshots of cognition
Useful for pinpointing important information, beliefs, and strategies for rapid single-shot learning
Currently implemented as vector-based memories for LLM agents/cognitive architectures;
better implementations are relatively straightforward and should be expected soon
Semantic learning/habit learning
Equivalent to continuously training an LLMs weights with examples from its deployment
Reinforcement learning
Learning not just what happened, but what lead to reward or situations that predict reward
Ongoing RLHF from user feedback is one of many possible straightforward implementations.
Working memory.
Temporary storage
Implemented by short-term persistent activation and short-term weight changes in cortex
Roughly, the context window in LLMs
Curiously, the limitation in both seems to be analogous: it’s not how much you can fit in there, which is a lot, but how much you can usefully get out.
Seven is NOT the correct average estimate of working memory capacity, except for things like phone numbers you can repeat vocally or subvocally.
Four is a more common general estimate, but it’s clear that the measured capacity depends on retreival difficulty from proactive interference.
Most lab tasks use highly overlapping repeated stimuli, eg the same colors or numbers repeated in different patters
Measures of working memory in text comprehension can be interpreted as showing nearly unlimited capacity, probably because the semantics barely overlap among different items.
I know less about benchmarks for retrieval from context windows, but have heard that there are similar limitations on retrieving more than a handful of items appropriately.
Learning (second alternate draft)
Here I mean continuous learning from experiences during deployment. Current LLMs barely do this; they accumulate contnext, but this isn’t quite learning, just using more extensive inputs to make better outputs. GPT4o now makes guesses at important information to put into “memory” (probably just added to the context window/promp), which appears to be fairly useful.
Humans have multiple types of continuous learning. Each could be implemented in conjunction with LLM agents/cognitive architectures, so I expect them to be implemented soonish.
Episodic memory for specific events.
This is a “snapshot” of a moment of cognition
Current vector-based memory in LLM agents does this, but very poorly compared to the human implementation. A more human-like system is relatively straightforward, but explaining why I think that would require explaining how to implement it, which I’m not going to do. Plenty of people understand this, but they tend to not be the people implementing AGI agents.
Updating semantic knowledge continuously
Even if I’m a dense amnesic without any working episodic memory (because my medial temporal lobe has been severely damaged), I’ll be able to use new knowledge if it’s repeated a few times. This is something like adding every user correction to the training set for the next training run, or running training continuously.