The term “AGI” is now creating confusion. When it’s used in the context of timelines or alignment, we don’t know if it means near-future LLMs, or superintelligence. It’s fair to use AGI as “fairly general AI at near-human level,” which includes current LLMs. But we should have a distinguishing term for the stronger use of AGI, because the implications for change and alignment are very different.
A fully general AI could think about any topic, with some important implications for alignment:
Topics outside of their training set:
Requires self-directed, online learning
Alignment may shift as knowledge and beliefs shift w/ learning
Their own beliefs and goals:
Alignment must be reflexively stable
Their context and cognition:
Alignment must be sufficient for contextual awareness and potential self-improvement
Actions
Agency is implied or trivial to add
I think we’ll create fully general AI very soon after we create limited general AI, like LLMs. Adding the above capabilities is:
Useful
Easy
Fascinating
More on each below.
Aligning more limited systems is important, but not likely to be adequate. So we should be clear which one we’re talking about.
I’ve been able to think of a few terms, but none are really satisfactory. I’ll add some in the answers, but I’d like to see independent thoughts first.
So: what’s a better term for strong AGI?
Why we might want to focus on strong AGI risks and alignment.
You can ignore this section if you’re already convinced we could use distinguishing terminology.
Some people think existential risk from AGI is less than 1%, while others think it is above 99%. There are many reasons for disagreement, but one big reason is that we are talking about different things.
It would be easier to convince people that AI could become dangerous if we focused discussion on AI that has all of humans’ cognitive abilities and more. It’s intuitively apparent that such an entity is dangerous, because humans are dangerous.
I think the unworried are often thinking of AI as a tool, while the worried are thinking about future AI that is more like a new intelligent species.
That distinction is highly intuitive. We deal with both tools and agents every day, with little overlap. Humans have rich intuitions about intelligent, goal-directed agents, since so much of our lives involves dealing with other humans. And yet, I don’t say “human-like AI” because I don’t want to invoke the intuitions that don’t apply: humans are all more-or-less similar in intelligence, can’t duplicate or upgrade themselves easily, etc.
Tools can be dangerous. Nuclear weapons are tools. But they are dangerous in a different way than a goal-directed, agentic threat. A bomb can go off if a human makes a decision or a mistake. But a tiger may eat you because it is hungry, wants to eat you, and can figure out how to find you and overcome your defenses. It[1] has an explicit goal of eating you, and some intelligence that will be applied to accomplishing that goal. It does not hate you, but it has other goals (survival) that make killing you an instrumental subgoal.
Tool AI is worth less and different worries than fully general AI. Mixing the two together in public discourse can make the worriers sound paranoid.
I think it’s also necessary to address another reason we aren’t doing this already: tool AI can be dangerous, so we don’t want to limit the discussion to only highly agentic, fully sapient AI. And there’s not a sharp distinction between the two; an oracle AI may have implicit goals and motivations.
But by failing to draw this distinction, we’re confusing the discussion. If we got people to consider the question “IF we made fully agentic AI, with every human cognitive ability and then some, THEN would I be concerned for our safety?” that would be a big win, because the answer is obviously yes. The discussion could then move on to a more specific and productive debate: “will we do such a thing?”
There I think the answer is also yes, and soon, but that’s another story.[2]
In sum, discussion of risk models specific to strong AGI seems helpful for both internal and public-facing discourse. So again: what’s a better term for the really dangerous sort of AI?
- ^
This may be a poor metaphor for modern-day humans who have forgotten what it’s like to have other large intelligent predators nearby; we could substitute a human threat, at risk of pulling in unwanted implications.
- ^
It seems like the first thing we’ll do with powerful oracle AI (like better LLMs/foundation models) is use to emulate agency of those attributes. With a smart-enough oracle, that’s as simple as asking the question “what would you do if you were a self-aware, self-reflective entity with the following goals and properties?”; feeding its outputs into whatever UIs we want; and iterating that prompt along with new sensory inputs as needed.
In practice, I think there are many scaffolding shortcuts we’ll take rather than merely developing tool AI until it is trivial to turn it into an agent. Current LLMs are like an intelligent human with complete destruction of the episodic memory areas in the medial temporal lobe, and severe damage to the frontal lobes that provide executive function for flexible goal-direction. There are obvious and easy routes to creating systems that scaffold foundation models with those capabilities, as well as sensory and effector systems, and associated simulation abilities.
Thus, I think the risks of danger from tool AI are real but probably not worth much of our worry budget; we will likely be eaten by a tiger of our own creation long before we can invent and mishandle an AI nuke. And there will be no time for that tiger to emerge from a tool system, because we’ll make it agentic on purpose before agency emerges. I’m even less worried about losing a metaphorical toe to a metaphorical AI adze in the meantime, although that could certainly happen.
I think the kind of AI you have in mind would be able to:
continue learning after being trained
think in an open-ended way after an initial command or prompt
have an ontological crisis
discover and exploit signals that were previously unknown to it
accumulate knowledge
become a closed-loop system
The best term I’ve thought of for that kind of AI is Artificial Open Learning Agent.
There are so many considerations in the design of AI. AGI was always a far too general term, and when people use it, I often ask what they mean and usually its “human-like or better than human chatbot”. Other people say its the “technological singularity” i.e. it can improve itself. These are obviously two very different things or at least two very different design features.
Saying “My company is going to build AGI” is like saying “My company is going to build computer software”. The best software for what exactly? What kind of software to solve what problem? What features? Usually the answer from AGI fans is “all of them”, so perhaps the term is just inherently vague by definition.
When talking about AI, I think its more useful to talk about what features a particular implementation will or wont have. You have already actually listed a few.
Here are some AI feature ideas from myself:
Ability to manipulate the physical world
Ability to operate without human prompting
Be “always on”
Have its own goals
Be able to access large additional computing resources for additional “world simulations” or for conducting virtual research experiments or spawning sub-processes or additional agents.
Be able to improve/train “itself” (really there is no “itself” since as many copies can be made as needed, and its then unclear which one is the original “it”)
Be able to change its own beliefs and goals through training or some other means (scary one)
Ability to to do any or some of above completely unsupervised and/or un-monitored
I think any useful terminology will probably be some sort of qualification. But it needs to be much more limited than the above specifications to be useful.
Spelling out everything you mean in every discussion is sort of the opposite of having generally-understood terminology.
Since I haven’t gotten any suggestions yet, here are some of my collected favorites, all with flaws:
Parahuman AGI—Loosely invokes being like-but-different-than humans, and working alongside humans. Still too vague, but I just can’t find any brief terminology.
REAL AGI—Reflexive Entity with Agency and Learning that is Artificially Generally Intelligenct. Grammatically clumsy.
Self Aware AGI—Captures one central bit, vaguely implies agency. Leaves out the rest.
Sapient AI—Invokes similarity to Homo Sapiens. I wrote about this concept and terminology in Sapience, understanding, and “AGI”. Now I think it’s too vague. Perhaps “silico sapiens” but that sounds so forced and silly.
Universal AI—Technically exactly how I’m defining it, but sounds like it’s implying that it understands everything already, rather than being able to learn about and thereby think about anything.
Sentient AI—The vague etymology is worse than sapient, but it’s in more common usage. It historically leans more toward feeling where sapient leans more toward understanding, but they’re both used in vague and conflicting ways and don’t have clean etymologies.
Artificial Fully General Intelligence—More or less the definition I’ve taken AGI to have, but doesn’t intuitively imply the important stuff like goal-directed agency and contextual awareness.
Artificial Individuals—Captures the right intuitions but abbreviates right back to AI.
Intelligent Artificial Minds—Seems to capture the right intuitions, and I like the implications of the abbreviation IAM.
Synthetic Entities or Artificial entities also came to mind.
Long horizon AI: an AI that can keep productively working productively on a nebulous task indefinitely
I’ve finally got one I like better: Superhuman Entities.
SHE is coming.