AGIs may value intrinsic rewards more than extrinsic ones
TLDR: Although AI agent paradigms use explicit reward approaches, the psychology of human motivation suggests that humans value internally generated reward as much if not more than external reward. I suggest that AIs that begin to exhibit behaviors that appear to be “internally” rewarded may reflect signs of AGI. But that might require us to create AIs that are no longer pure tools for solving individual tasks, but more wholistic agents that exist in the world.
Introduction
AI research focuses on training artificial agents to be motivated by externally provided or generated rewards. The greater the reward, the less the error and therefore less to be learned. But human behavior is motivated by many different factors with extrinsic rewards (e.g. getting food, money etc) being just one component. Here I argue that AI agents that may develop preferences for internal rewards may be more akin to humans and thus closer to AGI.
Behaviorism is a theory of learning externally rewarded behaviors—not a complete theory of learning and behavior
Behaviorist principles were developed starting in the early 20th century (Watson 1913) but are most associated with Skinner (see Skinner 1953). Briefly, behaviorists abandoned previously popular tools for explaining behavior such instincts, drives, needs and the black boxes of psychological states and argued that only variables that could be observed directly and measured quantitatively were useful. A central tenet of the theory is that all behavior can be explained as a function of prior environmental stimuli; the environment makes the animal do it. There was no need for internal or hidden states.
Although very successful in explaining previously puzzling phenomena, behaviorism faced challenges. In particular it required that animals operate “spontaneously” on the environment and that animals could not engage in any active “seeking”. This was a difficult sell; i.e. how would any animal behavior be initiated in the first place? The best explanation was that over time, and with sufficient reinforcement of reward-able actions—animals would begin to “emit” behaviors that were externally reinforced in the past. A very mechanical explanation of complex biological organisms.
Cognitive theories of motivation account for curiosity and latent learning.
Meanwhile, numerous experiments carried out in the middle of the 20th century revealed that external rewards were not required for learning (e.g. Tolman 1948). For example, rats would memorize to navigate a maze even without reward—essentially just for the joy of doing so (!). And when provided with reward, they would outperform rats who were always given reward from the beginning of the task. These types of behaviors and many others that appear to be driven by”curiosity” could not be explained by operant conditioning. Theories of “inner-loci” of control were thus posited where an agent contained an “inner cognitive model” of the world and used it to navigate and learn (e.g. Weiner 1986).
Inner sources of regulation provide a broader theory of human motivations
Developed over the 2nd half of the 20th century, intrinsic motivation theories extended this work and argued that individuals are not just external reward maximizers, but are born with innate internal drives to seek out new information, experiences and relationships. Humans might thus value “internally generated rewards” just as much if not more than external rewards. Several researchers (Nissen, Buttler, Mongomery, Harlow) carried out research showing that both rats and monkeys would spend energy or even endure slight pain to explore curiosity or experience novelty. Deci and Ryan (1985) - Self-Determination-Theory (SDT) - is perhaps the most well known culmination of the work on innate drives (with nearly 60,000 citations for a single work). Deci’s early work (1971) had shown that external rewards can actually corrupt internal drives. For example, people who receive extrinsic rewards for carrying out parts of their jobs may not feel as strongly about their jobs as those whose pay is based on intrinsic factors like satisfaction at the end of the day. And subjects who solve puzzles for pleasure are more motivated and often do better than those who are paid. SDT posits that individuals are driven by three psychological forces: autonomy, competence and relatedness. These innate drives explain many of our intuitions about why human beings value certain things in life; but they also suggest strategies for achieving greater happiness. For instance, since we all enjoy personal mastery, it stands to reason that our sense of competence should be maximized. Similarly, when we experience feelings of deep connection to others—as in the love we feel for our partners, children and parents—we also feel more fulfilled.
It is possible to find evolutionary biology-based explanations for SDT—and argue that innate drives are actually drives aimed at potentially maximizing some eventual future reward. For example, humans evolved as social creatures, which means that we’re hardwired to crave meaningful interpersonal relationships and to care deeply about the welfare of others—but also that being in groups kept us safer and more protected. Thus “relatedness” is maybe a proxy for this potential evolutionary advantage. Also, autonomy is a type of guarantee that we are seeking to be “actors” or agents in the world—rather than passive observers. Being autonomous might thus increase our chances of survival.
Overall, it could be argued that curiosity and novelty exploration are rationally connected to potential external rewards in the future (e.g. learning your environment and where potential food or danger sources may lie). But even if so, we (along with many other non-human animals) seem to enjoy and receive significant fulfillment from many activities that are extremely unlikely to lead to external rewards (e.g. play, reading etc).
Internal-reward seeking AIs may point the way to AGIs
Might AIs develop innate needs that go beyond receiving external rewards and maximizing utility? This is not the same as Steve Omohundro’s AI Drives work where AGIs essential converge towards autonomy (etc.) so they can maximize utility. What might a non-utility maximizing AGI look like? Can we even create one? Perhaps engaging in curiosity-like behaviors that don’t immediately lead to external reward. It is challenging to conceptualize this as AIs do not have biological drives for survival and procreation. And most research is highly task focused (i.e. complete the task, not explore the environment randomly). For now, it seems confusing to think of AIs as real agents in the world that we might create and train to do more than complete individual tasks.
It’s pretty easy to make your references actual footnotes with google scholar links using either the markdown or html editor, and this really helps readers follow your references vs just mentioning them vaguely like “Deci and Ryan (1985)”.
I see play serving some vital functions:
exploring new existential modes. Trying out new ways of being without having to take a leap of faith.
connecting with people, and building trust. I include things like flirting, banter, and make-believe here.
As for reading, I think of it as a version of exploring.
Note that there are certain behaviours that I’m sure aren’t very adaptive, but I have a hunch that many of them can be traced back to some manner of fitness improvement. My current hunch (pinch of salt please) is that most seemingly unnecessary action-categories either serve a hidden purpose , or are “side effects”. By “side effects”, I mean that the actions & habits spring from a root shared with other (more adaptive) behaviour patterns. This “root” can be a shard residing at a high abstraction level, or some instinct, depending on your view.
Also, as I’m writing this, I realize that this is very hard to falsify and that my claims aren’t super rigorous. Hope it can be of some use to someone anyway.
Thanks for the reply Jonathan. Indeed I’m also a bit skeptical that our innate drives (whether the ones from SDT theory or others) are really non-utility maximizing. But in some cases they do appear so.
One possibility is that they were driven to evolve for utility maximization but have now broken off completely and serve some difficult-to-understand purpose. I think there are similar theories of how consciousness developed—i.e. that it evolved as a by-effect/side-effect of some inter-organism communication—and now plays many other roles.
Maximisation of explicit reward is the defining feature of Reinforcement Learning, but this is just one of many agentic intelligence architectures (see alternatives listed here). The architectures that I mention below in this comment: Active Inference, ReduNets, and GFlowNets, all use intrinsic motivation. And modern Reinforcement Learning with Entropy Regularization also models intrinsic motivation. Therefore, no, intrinsic motivation is not a sign of looming AGI. In principle, agents on any scale and capability level can have it.
See Friston et al. “Active inference and epistemic value” (2015):
Also, the maximisation of information gain (aka epistemic value, Bayesian surprise, intrinsic motivation, optimal (Bayesian) experimental design, and infomax principle), is discussed a lot in the more recent book Active Inference (Parr, Pezzulo, and Friston 2022), especially in Chapters 2 and 10. The book is available online for free. The exploration-exploitation tradeoff, as well as the relationships between Active Inference with decision-making and other (agentic) intelligence frameworks are discussed.
A normative theory of agency alternative to Active Inference, the Principle of Maximizing Rate Reduction, can also be viewed as a generalisation of information gain (Chan, Yu, You et al. 2022). See also this recent workshop with Ma and Friston (the masterminds behind ReduNets and Active Inference, respectively).
Yet another framework suitable for AI agents, GFlowNet, also doesn’t use the notion of reward maximisation and instead, fits the “reward function”, which is conceptually very similar to Active Inference agents minimising their expected free energy.
If by “internal reward” you mean “intrinsically determined preferences/goals”, then Active Inference operationalises it, too, as prior preferences, that can be learned just as everything else. Which is the answer to your question “Might AIs develop innate needs that go beyond receiving external rewards and maximizing utility?”
Neither Active Inference nor ReduNets nor GFlowNets nor LeCun’s architecture “maximise utility”, but they still all instrumentally converge. Instrumental convergence basically approximately equals capability (fitness).
Hi Roman.
First of all, thank you so much for reading and taking the time to respond.
I don’t have the time—or knowledge—to respond to everything, but from your response, I worry that my article partially missed the target. I’m trying to argue that humans may not be just—utility—maximizers and that a large part of being human (or maybe any organism?) is to just enjoy the world via some quasi-non-rewarded types of behavior. So there’s no real utility for some or perhaps the most important things that we value. Seeking out “surprising” results does help AIs and humans learn, and seeking out information as well. But I’m not sure human psychology supports human intrinsic rewards as necessarily related to utility maximization. I do view survival and procreation as genetically encoded drives—but they are not the innate drives I described above. It’s not completely clear what we gain when we enjoy being in the world, learning, socializing.
I’m aware of Friston’s free energy principle (it was one of the first things I looked at in graduate school). I personally view most of it as non-falsifiable, but I know that many have used to derive useful interpretation of brain function.
Also I quickly googled LeCun’s proposal, and his conception of future AI, and his intrinsic motivation module is largely about boot-strapped goals—albeit human pro-social ones.
The ultimate goal of the agent is minimize the intrinsic cost over the long run. This is where basic behavioral drives and intrinsic motivations reside. The design of the intrinsic cost module determines the nature of the agent’s behavior. Basic drives can be hard-wired in this module. This may include feeling \good” (low energy) when standing up to motivate a legged robot to walk, when influencing the state of the world to motivate agency, when interacting with humans to motivate social behavior, when perceiving joy in nearby humans to motivate empathy, when having a full energy supplies (hunger/satiety), when experiencing a new situation to motivate curiosity and exploration, when fulfilling a particular program, etc
I would say that my question—which I did not answer in the post—is whether we can design AIs that don’t seek to maximize some utility or minimize some cost? What would that look like? Some computer-cluster just spinning up to do computations for no effective purpose?
I don’t really have an answer here.
Let me rephrase your thought, as I understand it: “I don’t think humans are (pure) RL-like agents, they are more like ActInf agents” (by “pure” RL I mean RL without entropy regularization, or other schemes that motivate exploration).
There is copious literature finding the neuronal, neuropsychological, or psychological makeup of humans “basically implementing Active Inference”, as well as “basically implementing RL”. The portion of this research that is more rigorous maps the empirical observations from neurobiology directly onto the mathematics of ActInf and RL, respectively. I think this kind of research is useful, it equips us with instruments to predict certain aspects of human behaviour, and suggests avenues for disorder treatment.
The portion of this research that is less rigorous and more philosophical, is like pointing out “it looks like humans behave here like ActInf agents”, or “it looks like humans behave here like RL agents”. This kind of philosophy is only useful for suggesting a direction for mining empirical observations, to either confirm or disprove theories that in this or that corner of behaviour/psychology, humans act more like ActInf, or RL agents. (Note that I would not count observations from psychology here, because they are notoriously unreliable themselves, see reproducibility crisis, etc.)
RL is not falsifiable, too. Both can be seen as normative theories of agency. Normative theories are unfalsifiable, they are prescriptions, or, if you want, the sources of the definition of agency.
However, I would say that ActInf is also a physical theory (apart from being normative) because it’s derived from (or at least related to) statistical mechanics and the principle of least action. RL is “just” a normative framework of agency because I don’t see any relationship with physics in it (again, if you don’t add entropy regularisation).
I answered to this question above: yes, you can design AI that will not minimise or maximise any utility or cost, but only some form of energy. Just choose Active Inference, ReduNet, GFlowNet, or LeCun’s architecture[1]. It’s not just renaming “utility” into “energy”, there is a deep philosophical departure. (I’m not sure it’s articulated somewhere in a piece dedicated to this question, the best resources that I can recommend are the sections which discuss RL in Active Inference book, LeCun’s paper (see section “Reward is not enough”), and Bengio’s GFlowNet tutorial, all links are above.
However, as I pointed out above, this doesn’t save you from instrumental convergence. Which can be just as bad (for humans) as a prototypical utility/cost/paperclip maximiser.
If you want an agent that doesn’t instrumentally converge at all, please see the discussion of Mild Optimization.
Caveats apply: embedded agents could still emerge inside agents with these architectures, and these embedded agents might in principle be RL. Perhaps, this is actually why humans sometimes exhibit RL-like behaviour, even though “fundamentally” they are more like ActInf agents.