AGIs may value intrinsic rewards more than extrinsic ones

TLDR: Although AI agent paradigms use explicit reward approaches, the psychology of human motivation suggests that humans value internally generated reward as much if not more than external reward. I suggest that AIs that begin to exhibit behaviors that appear to be “internally” rewarded may reflect signs of AGI. But that might require us to create AIs that are no longer pure tools for solving individual tasks, but more wholistic agents that exist in the world.

Introduction

AI research focuses on training artificial agents to be motivated by externally provided or generated rewards. The greater the reward, the less the error and therefore less to be learned. But human behavior is motivated by many different factors with extrinsic rewards (e.g. getting food, money etc) being just one component. Here I argue that AI agents that may develop preferences for internal rewards may be more akin to humans and thus closer to AGI.

Behaviorism is a theory of learning externally rewarded behaviors—not a complete theory of learning and behavior

Behaviorist principles were developed starting in the early 20th century (Watson 1913) but are most associated with Skinner (see Skinner 1953). Briefly, behaviorists abandoned previously popular tools for explaining behavior such instincts, drives, needs and the black boxes of psychological states and argued that only variables that could be observed directly and measured quantitatively were useful. A central tenet of the theory is that all behavior can be explained as a function of prior environmental stimuli; the environment makes the animal do it. There was no need for internal or hidden states.

Although very successful in explaining previously puzzling phenomena, behaviorism faced challenges. In particular it required that animals operate “spontaneously” on the environment and that animals could not engage in any active “seeking”. This was a difficult sell; i.e. how would any animal behavior be initiated in the first place? The best explanation was that over time, and with sufficient reinforcement of reward-able actions—animals would begin to “emit” behaviors that were externally reinforced in the past. A very mechanical explanation of complex biological organisms.

Cognitive theories of motivation account for curiosity and latent learning.

Meanwhile, numerous experiments carried out in the middle of the 20th century revealed that external rewards were not required for learning (e.g. Tolman 1948). For example, rats would memorize to navigate a maze even without reward—essentially just for the joy of doing so (!). And when provided with reward, they would outperform rats who were always given reward from the beginning of the task. These types of behaviors and many others that appear to be driven by”curiosity” could not be explained by operant conditioning. Theories of “inner-loci” of control were thus posited where an agent contained an “inner cognitive model” of the world and used it to navigate and learn (e.g. Weiner 1986).

Inner sources of regulation provide a broader theory of human motivations

Developed over the 2nd half of the 20th century, intrinsic motivation theories extended this work and argued that individuals are not just external reward maximizers, but are born with innate internal drives to seek out new information, experiences and relationships. Humans might thus value “internally generated rewards” just as much if not more than external rewards. Several researchers (Nissen, Buttler, Mongomery, Harlow) carried out research showing that both rats and monkeys would spend energy or even endure slight pain to explore curiosity or experience novelty. Deci and Ryan (1985) - Self-Determination-Theory (SDT) - is perhaps the most well known culmination of the work on innate drives (with nearly 60,000 citations for a single work). Deci’s early work (1971) had shown that external rewards can actually corrupt internal drives. For example, people who receive extrinsic rewards for carrying out parts of their jobs may not feel as strongly about their jobs as those whose pay is based on intrinsic factors like satisfaction at the end of the day. And subjects who solve puzzles for pleasure are more motivated and often do better than those who are paid. SDT posits that individuals are driven by three psychological forces: autonomy, competence and relatedness. These innate drives explain many of our intuitions about why human beings value certain things in life; but they also suggest strategies for achieving greater happiness. For instance, since we all enjoy personal mastery, it stands to reason that our sense of competence should be maximized. Similarly, when we experience feelings of deep connection to others—as in the love we feel for our partners, children and parents—we also feel more fulfilled.

It is possible to find evolutionary biology-based explanations for SDT—and argue that innate drives are actually drives aimed at potentially maximizing some eventual future reward. For example, humans evolved as social creatures, which means that we’re hardwired to crave meaningful interpersonal relationships and to care deeply about the welfare of others—but also that being in groups kept us safer and more protected. Thus “relatedness” is maybe a proxy for this potential evolutionary advantage. Also, autonomy is a type of guarantee that we are seeking to be “actors” or agents in the world—rather than passive observers. Being autonomous might thus increase our chances of survival.

Overall, it could be argued that curiosity and novelty exploration are rationally connected to potential external rewards in the future (e.g. learning your environment and where potential food or danger sources may lie). But even if so, we (along with many other non-human animals) seem to enjoy and receive significant fulfillment from many activities that are extremely unlikely to lead to external rewards (e.g. play, reading etc).

Internal-reward seeking AIs may point the way to AGIs

Might AIs develop innate needs that go beyond receiving external rewards and maximizing utility? This is not the same as Steve Omohundro’s AI Drives work where AGIs essential converge towards autonomy (etc.) so they can maximize utility. What might a non-utility maximizing AGI look like? Can we even create one? Perhaps engaging in curiosity-like behaviors that don’t immediately lead to external reward. It is challenging to conceptualize this as AIs do not have biological drives for survival and procreation. And most research is highly task focused (i.e. complete the task, not explore the environment randomly). For now, it seems confusing to think of AIs as real agents in the world that we might create and train to do more than complete individual tasks.