[Question] What’s a better term now that “AGI” is too vague?

The term “AGI” is now creating confusion. When it’s used in the context of timelines or alignment, we don’t know if it means near-future LLMs, or superintelligence. It’s fair to use AGI as “fairly general AI at near-human level,” which includes current LLMs. But we should have a distinguishing term for the stronger use of AGI, because the implications for change and alignment are very different.

A fully general AI could think about any topic, with some important implications for alignment:

  • Topics outside of their training set:

    • Requires self-directed, online learning

      • Alignment may shift as knowledge and beliefs shift w/​ learning

  • Their own beliefs and goals:

    • Alignment must be reflexively stable

  • Their context and cognition:

    • Alignment must be sufficient for contextual awareness and potential self-improvement

  • Actions

    • Agency is implied or trivial to add

I think we’ll create fully general AI very soon after we create limited general AI, like LLMs. Adding the above capabilities is:

  • Useful

  • Easy

  • Fascinating

More on each below.

Aligning more limited systems is important, but not likely to be adequate. So we should be clear which one we’re talking about.

I’ve been able to think of a few terms, but none are really satisfactory. I’ll add some in the answers, but I’d like to see independent thoughts first.

So: what’s a better term for strong AGI?


Why we might want to focus on strong AGI risks and alignment.

You can ignore this section if you’re already convinced we could use distinguishing terminology.

Some people think existential risk from AGI is less than 1%, while others think it is above 99%. There are many reasons for disagreement, but one big reason is that we are talking about different things.

It would be easier to convince people that AI could become dangerous if we focused discussion on AI that has all of humans’ cognitive abilities and more. It’s intuitively apparent that such an entity is dangerous, because humans are dangerous.

I think the unworried are often thinking of AI as a tool, while the worried are thinking about future AI that is more like a new intelligent species.

That distinction is highly intuitive. We deal with both tools and agents every day, with little overlap. Humans have rich intuitions about intelligent, goal-directed agents, since so much of our lives involves dealing with other humans. And yet, I don’t say “human-like AI” because I don’t want to invoke the intuitions that don’t apply: humans are all more-or-less similar in intelligence, can’t duplicate or upgrade themselves easily, etc.

Tools can be dangerous. Nuclear weapons are tools. But they are dangerous in a different way than a goal-directed, agentic threat. A bomb can go off if a human makes a decision or a mistake. But a tiger may eat you because it is hungry, wants to eat you, and can figure out how to find you and overcome your defenses. It[1] has an explicit goal of eating you, and some intelligence that will be applied to accomplishing that goal. It does not hate you, but it has other goals (survival) that make killing you an instrumental subgoal.

Tool AI is worth less and different worries than fully general AI. Mixing the two together in public discourse can make the worriers sound paranoid.

I think it’s also necessary to address another reason we aren’t doing this already: tool AI can be dangerous, so we don’t want to limit the discussion to only highly agentic, fully sapient AI. And there’s not a sharp distinction between the two; an oracle AI may have implicit goals and motivations.

But by failing to draw this distinction, we’re confusing the discussion. If we got people to consider the question “IF we made fully agentic AI, with every human cognitive ability and then some, THEN would I be concerned for our safety?” that would be a big win, because the answer is obviously yes. The discussion could then move on to a more specific and productive debate: “will we do such a thing?”

There I think the answer is also yes, and soon, but that’s another story.[2]

In sum, discussion of risk models specific to strong AGI seems helpful for both internal and public-facing discourse. So again: what’s a better term for the really dangerous sort of AI?

  1. ^

    This may be a poor metaphor for modern-day humans who have forgotten what it’s like to have other large intelligent predators nearby; we could substitute a human threat, at risk of pulling in unwanted implications.

  2. ^

    It seems like the first thing we’ll do with powerful oracle AI (like better LLMs/​foundation models) is use to emulate agency of those attributes. With a smart-enough oracle, that’s as simple as asking the question “what would you do if you were a self-aware, self-reflective entity with the following goals and properties?”; feeding its outputs into whatever UIs we want; and iterating that prompt along with new sensory inputs as needed.

    In practice, I think there are many scaffolding shortcuts we’ll take rather than merely developing tool AI until it is trivial to turn it into an agent. Current LLMs are like an intelligent human with complete destruction of the episodic memory areas in the medial temporal lobe, and severe damage to the frontal lobes that provide executive function for flexible goal-direction. There are obvious and easy routes to creating systems that scaffold foundation models with those capabilities, as well as sensory and effector systems, and associated simulation abilities.

    Thus, I think the risks of danger from tool AI are real but probably not worth much of our worry budget; we will likely be eaten by a tiger of our own creation long before we can invent and mishandle an AI nuke. And there will be no time for that tiger to emerge from a tool system, because we’ll make it agentic on purpose before agency emerges. I’m even less worried about losing a metaphorical toe to a metaphorical AI adze in the meantime, although that could certainly happen.