Sapience, understanding, and “AGI”
Epistemic status: I’m sure that “AGI” has become importantly confusing. I think it’s leaving out a critical set of capabilities. I think those are closely related, so acquiring them could create a jump in capabilities. I’m not sure of the best term to disambiguate the type of AGI that’s most dangerous, but I want to propose one that works decently
The term AGI has become muddled. It is now used for AI both with and without agency, contextual awareness, and the capacity to actively learn new facts and concepts. “Sapience” means understanding, wisdom and self-awareness, so it could be adopted as a disambiguating term. Understanding as humans perform it is an active process for testing and improving our knowledge. Understanding allows self-awareness and contextual awareness. It also implies agency, because our process of understanding is goal-directed.
I have three goals here. One is to point out how the term “AGI” is confusing x-risk discussions. The second is to discuss how human-like undestanding is poweful, achievable, and dangerous. Last and least I propose the specific term “sapience” for the set of powerful and dangerous capabilities provided by active understanding.
Understanding as an active process of testing and improving “fit” among concepts and world-models
Sapience implies agency and understanding
The concept of agency is not just a philosophical nicety; it’s pivotal in discussions about existential risk (x-risk) related to AI. Without a clear understanding and explicit treatment of agency, these discussions have become confused and potentially misleading. The need for a new term is evident, given widely varying definitions of AGI, and resulting disagreements about risks and capabilities.
“Sapience” appears to be our most fitting option. The term is used in various ways with weak connections to its etymology of wisdom or discernment, but its most common usages are the ones we need. In the realm of science fiction, it’s often employed to denote an intelligent, self-aware species distinct from humans. We, as “Homo Sapiens”, pride ourselves as the “understanding apes,” setting ourselves apart from our evolutionary kin. This self-ascription may spring from vanity, but it invokes a critical cognitive capacity we commonly refer to as “understanding.”
Human understanding
The debate surrounding whether large language models (LLMs) truly “understand” their output is a persistent one. Critics argue that LLMs lack genuine understanding, merely echoing word usage without deeper cognitive processing. Others suggest that humans are largely “stochastic parrots” as well. But we do more than parrot. If I use a term appropriately, that might evidence “an understanding” of it; but that is not what we usually mean by understanding. It is primarily a verb. Understanding is a process. We understand in an important, active sense that current LLMs lack. In everyday usage, “understanding” implies an active engagement with concepts.
To say “I understand” is to assert that one has engaged with a concept, testing and exploring it. This process is akin to mentally simulating or “turning over” the concept, examining its fit with other data and ideas. For instance, understanding how a faucet works might involve visualizing the mechanism that allows water to flow upon moving the handle. These mental simulations can vary in detail and abstraction, contingent on the concept and the criteria one sets for claiming understanding. If understanding seems out of reach, one might actively pursue or formulate new hypotheses about the concept, evaluating these for their potential to foster understanding.
Imagine for a moment that you’ve just learned about the platypus. Someone’s told you some things about it, and you’re asking yourself if you understand. This active self-questioning is critical to our notion of understanding, and I think it is most easily achieved through what might be termed cognitive agency. You asked yourself whether you understood for a purpose. If there’s no reason for you to understand the concept, you probably won’t waste the time it would take to test your understanding. If you do ask, you probably have some criteria for adequate understanding, for your current purposes. That purpose could be curiousity, a desire to show that you understand things people explain to you, or more practical goals of deploying your new concept to achieve material ends.
To answer the question of whether you adequately understand, you’ll use one or more strategies to test your understanding. You might form a mental simulation of a platypus, and imagine it doing things you care about. That simulation attempt might reveal important missing information—is a platypus prehistoric or current? Is it huge or small? You might guess that it swims if its webbed feet have been described. You might ask yourself if it’s edible or dangerous if those are your concerns, and perform mental simulations exploring different ways you could hunt it, or it could hunt you.
If these simulations produce inconsistent results, either predicting things you know aren’t true. Perhaps it sounds incongruous that you haven’t already heard that there’s a giant poisonous swimming animal, (if you got the size wrong) or perhaps imagining a furred animal laying eggs is recognized as incongruous. If these tests of understanding fail, you may decide to spend additional time trying to improve that understanding. You may ask questions or seek more information, or you may change your assumptions and try more simulations, to see if that produces results more congruent with the rest of your world knowledge.
I think it’s hard to guess how quickly and easily these capacities might be developed for AI. Large language models appear to have all of the requisite abilities, and attempts to develop language model cognitive architectures that can organize those capacities into more complex cognition are in the early days.[1] If those capacities are as useful and understandable as I think, we might see other approaches also develop this capacity for active understanding.
Understanding implies full cognitive generality
This capacity for testing and enhancing one’s understanding is largely domain-general, akin to assessing and refining a tool’s design. Consequently, this active ability to understand includes a capacity for self-awareness and contextual awareness. Such a system can develop understanding of its own cognitive processes, and its relation to the surrounding world. It also suggests a level of agency, at least in the functional sense of pursuing the goal of refining or enhancing understanding. These abilities may be so closely related that it would be be harder to create an AGI that has some but not all of them.
There are probably other paths to adequate functional understanding. LLMs do not need a human-like active process of testing and expanding their understanding to use concepts remarkably effectively. I think it will prove relatively easy and effective to add such an active process to agentic systems. If we do first create non-agentic AGI, I think we’ll quickly make it agentic and self-teaching, since those capacities may be relatively easy to “bolt on”, as language model cognitive architectures do for LLMs. I think the attainment of active understanding skills was probably a key achievement in accelerating our cognition far beyond our immediate ancestors, and I think the same achievement is likely to accelerate AI capabilities.
Leaving aside the above theories of why AGI might take a human-like route to understanding, “sapience” still seems like an effective term to capture an AGI that functionally understands itself and its context, and can extend its functional understanding to accomplish its goals.
“AGI” is a dangerously ambiguous term
AGI now means AI that has capabilities in many domains. But the original usage included agency and the capacity for self-improvement. A fully general intelligence doesn’t merely do many things; it can teach itself to do anything. We apply agency to improve our knowledge and abilities when we actively understand, and AGI in its fullest and most dangerous sense can too. Sapient AI is not just versatile across various domains but capable of understanding anything—including self-comprehension and situational awareness. Understanding one’s own thinking offers another significant advantage: the capability to apply one’s intelligence to optimize cognitive strategies (this is independent of recursive self-improvement of architecure and hardware). An AI that can actively understand can develop new capabilities.
The term sapience calls to mind our self-given title of Homo Sapiens. This title refers to the key difference between us and earlier hominids, and so sapience suggests an analogous difference between current narrow AI and successors that are better at understanding. Homo Erectus was modestly successful, but we Sapiens took over the world (and eliminated many species without malice toward them). The term invokes our rich intuition of humans as capable, dangerous, and ever-improving agents, hopefully without too many intuitions about specific human values (these are more closely connected to the older Homo aspect of our designation).
As it’s used now, the term AGI suffers from an ambiguity that’s crucial in x-risk discussions. It’s now used for an intelligence capable of performing a wide array of tasks, whether or not that intelligence is agentic, self-aware, or contextually aware. This usage seems to have slipped in from the term’s wider adoption by more conventionally minded pundits, economists, and technologists; they often assume that AI will remain a technology. But AGI isn’t just a technology if it’s agentic. The distinction might seem unimportant in economic contexts, but it’s crucial in x-risk discussions. The other existing terms I’ve found and thought don’t seem to capture this distinction as sapience and/or they carry other unwanted implications and intuitions.[2]
When your terms are co-opted, you can fight to take them back, or shift to new terms. I don’t think it’s wise to fight, because fighting for terminology doesn’t usually work, and more importantly because fighting causes polarization. Creating more rabid anti-safety advocates could be very bad (if many experts are loudly proclaiming that AGI doesn’t carry an existential risk, policymakers may believe and cite whichever group suits their agenda).
Therefore, it seems useful to introduce a new term. Different risk models apply to sapient AI than to AGI without the capacities implied by sapience. By distinguishing these we can deconfuse our conversations, and more clearly convey the most severe risks we face. Artificial sapience (AS) or sapient AI (SAI) is my proposed terminology.
Disambiguating the terms we use for transformative AI seems like a pure win. I’m not sure if sapience is the best term to do that disambiguation, and I’d love to hear other ideas. Separately, if I’m right about the usefulness of active understanding, we might expect to see a capabilities jump when this capacity is achieved.
- ^
I confess to not knowing exactly how those complex processes work in the brain, but the cognitive outlines seem clear. I have some ideas on how to biological networks might naturally capture incongruity between successive representations, and a detailed theory about how the necessary decision-making works. But these active understanding processes are somewhat different than and include a separate component of recognizing incongruity that’s not needed for even for decision-making and planning in complex domains. I think these are hard-won cognitive skills that we practice and develop during our educations. Our basic brain mechanisms provide an adequate basis for creating mental simulations and testing their congruity with other simulations, but we probably need to create skills based on those mechanisms.
Current LLMs seem to possess the requisite base capabilities, at least in limited form. They can both create simulations in linguistic form, and make judgments about the congruity of multiple statements. I’m concerned that we’re collectively overlooking how close language model cognitive architectures might be to achieving AGI. I think that a hard-coded decision-tree using scripted prompts to evaluate brenches could provide that skill that organizes our capacities into active understanding. Perhaps GPT4 and current architectures aren’t quite capable enough to get much traction in such an iterative process of testing and improving understanding. But it seems entirely plausible to me that GPT5 or GPT6 might have that capacity, when combined with improved and elaborated episodic memory, sensory networks, and coded (or learned) algorithms to emulate this type of strategic cognitive sequencing. I discuss the potentials of language model cognitive architectures (a more complete term for language model agents) here.
- ^
One possible alternative term is superintelligence. I think this does intuitively imply the capacity for rich understanding and extension of knowledge to any domain. But it does not firmly imply agency. More importantly, it also conveys an intuition of being more distant than the first self-aware, self-teaching AI. Artificial sentience does not imply better-then human but merely near-human cognitive abilities. Parahuman AI is another possible term, but I think it too strongly implies human-like, while not pointing as clearly at a capacity for rich and extensible understanding. More explicit terms, like self-aware agentic AGI (SAAAAI?;) seem clumsy. Artificial sentience is another possibility, but sentience is more commonly used for having moral worth by virtue of having phenomenal consciousness and “feelings”. Avoiding those implications seems important for clear discussions of capabilities and x-risk. Agentic AGI might be adequate, but it leaves out the implication of active understanding, contextual awareness and goal-directed learning. But I’m not sure that artificial sapience is the best term, so I’m happy to adopt another term for roughly the same set of concepts if someone has a better idea.
- Instruction-following AGI is easier and more likely than value aligned AGI by 15 May 2024 19:38 UTC; 70 points) (
- Corrigibility or DWIM is an attractive primary goal for AGI by 25 Nov 2023 19:37 UTC; 16 points) (
- 19 Nov 2024 1:31 UTC; 6 points) 's comment on Training AI agents to solve hard problems could lead to Scheming by (
- 7 Dec 2023 18:53 UTC; 4 points) 's comment on Based Beff Jezos and the Accelerationists by (
- 8 Dec 2023 0:08 UTC; 3 points) 's comment on Biological superintelligence: a solution to AI safety by (EA Forum;
- 24 Nov 2023 21:00 UTC; 2 points) 's comment on Ability to solve long-horizon tasks correlates with wanting things in the behaviorist sense by (
- 28 Nov 2023 2:29 UTC; 2 points) 's comment on There is no IQ for AI by (
- 18 Jan 2024 20:54 UTC; 2 points) 's comment on Being nicer than Clippy by (
This geological era is the Antropocene, and there is a mass extenction going on, because of all the effects we’ve had on the planet, after dominating just about every ecosystem on land. None of our ape or hominid relatives or forbears did that. We objectively are an exceptional species — that’s not vanity. Whether a word with roots in “wisdom” is quite the right descriptor for this difference is less clear, but it’s definitely related to us knowing a lot.
I’m not sure we need another term. Instead, it would be helpful to be precise about human agency versus AI agency. Some of the difficulty with terminology involves how much we tend to obscure the human labor underlying certain processes. It is not, for example, accurate to say that AI can now design cities. This kind of phrasing gives AI agency, and even anthropomorphizes it. It is more accurate to say city planners can now use a machine learning system to analyse survey data about which street designs are the safest and so on… If we could be more precise about what is going on with our systems and give humans their just due, then some of this confusion (and fear that the robots are taking over) would die down.
I’m also not sure we need a new term. But spelling out exactly what you mean in every statement gets cumbersome. I hate jargon, but there’s a reason for new terms for new concepts.
The issue I care about isn’t what AGI can do now; it’s what it can and will do in the future. If it keeps helping people design things, with no agency (goals) of its own, that’s great. It could go wrong, but that’s a subtle argument. My point is that we need a term to distinguish AI that just gives answers, like “how could this city be designed better”, from AI with goals like “design a better city”. That kind is the one we’re really worried about. Because designing the very best city implies using the most compute to do it, and getting the most compute might also imply keeping humans from interfering with your plans.
If we could ensure that AGI never has its own goals, I think most of the confusion and fear would and should die down. As it is, we’re mixing important concerns about agentic AGI with less clear and less terrifying concerns about non-agentic, tool or “oracle” AGI.