Although AI agent paradigms use explicit reward approaches, the psychology of human motivation suggests that humans value internally generated reward as much if not more than external reward. I suggest that AIs that begin to exhibit behaviors that appear to be “internally” rewarded may reflect signs of AGI.
Maximisation of explicit reward is the defining feature of Reinforcement Learning, but this is just one of many agentic intelligence architectures (see alternatives listed here). The architectures that I mention below in this comment: Active Inference, ReduNets, and GFlowNets, all use intrinsic motivation. And modern Reinforcement Learning with Entropy Regularization also models intrinsic motivation. Therefore, no, intrinsic motivation is not a sign of looming AGI. In principle, agents on any scale and capability level can have it.
Epistemic value and implicit exploratory behavior are related to curiosity in psychology (Harlow, 1950; Ryan & Deci, 1985) and intrinsic motivation in reinforcement learning (Baldassarre & Mirolli, 2013; Barto, Singh, & Chentanez, 2004; Oudeyer & Kaplan, 2007; Schembri, Mirolli, & Baldassare, 2007; Schmidhuber, 1991). Here intrinsic stands in opposition to extrinsic (e.g., drive or goal) value. While we have focused on reducing uncertainty during inference, most reinforcement learning research uses curiosity or novelty-based mechanisms to learn a policy or model efficiently. The general idea here is that an agent should select actions that improve learning or prediction, thus avoiding behaviors that preclude learning (either because these behaviors are already learned or because they are unlearnable). It has often been emphasized that adaptive agents should seek out surprising stimuli, not unsurprising stimuli as assumed in active inference. This apparent discrepancy can be reconciled if one considers that surprising events, in the setting of curiosity and Bayesian surprise, are simply outcomes that are salient and minimize uncertainty. In active inference, agents are surprised when they do not minimize uncertainty. It is salient (counterfactual) outcomes that optimize exploration (and model selection) and salience-seeking behavior stems nicely from the more general objective of minimizing expected free energy (or surprise proper).
There is, however, an important difference between active inference and the concepts of curiosity and Bayesian surprise, at least as they are usually used. Salience is typically framed in “bottomup” terms, in that the agents are not assumed to have a particular goal or task. This is also a characteristic of curiosity (and similar) algorithms that try to learn all possible models, without knowing in advance which will be useful for achieving a specific goal. The active inference scheme considered here contextualizes the utilitarian value of competing policies in terms of their epistemic value, where the implicit reduction in uncertainty is (or can be) tailored for the goals or preferred outcomes in mind.
Also, the maximisation of information gain (aka epistemic value, Bayesian surprise, intrinsic motivation, optimal (Bayesian) experimental design, and infomax principle), is discussed a lot in the more recent book Active Inference (Parr, Pezzulo, and Friston 2022), especially in Chapters 2 and 10. The book is available online for free. The exploration-exploitation tradeoff, as well as the relationships between Active Inference with decision-making and other (agentic) intelligence frameworks are discussed.
Yet another framework suitable for AI agents, GFlowNet, also doesn’t use the notion of reward maximisation and instead, fits the “reward function”, which is conceptually very similar to Active Inference agents minimising their expected free energy.
First of all, thank you so much for reading and taking the time to respond.
I don’t have the time—or knowledge—to respond to everything, but from your response, I worry that my article partially missed the target. I’m trying to argue that humans may not be just—utility—maximizers and that a large part of being human (or maybe any organism?) is to just enjoy the world via some quasi-non-rewarded types of behavior. So there’s no real utility for some or perhaps the most important things that we value. Seeking out “surprising” results does help AIs and humans learn, and seeking out information as well. But I’m not sure human psychology supports human intrinsic rewards as necessarily related to utility maximization. I do view survival and procreation as genetically encoded drives—but they are not the innate drives I described above. It’s not completely clear what we gain when we enjoy being in the world, learning, socializing.
I’m aware of Friston’s free energy principle (it was one of the first things I looked at in graduate school). I personally view most of it as non-falsifiable, but I know that many have used to derive useful interpretation of brain function.
Also I quickly googled LeCun’s proposal, and his conception of future AI, and his intrinsic motivation module is largely about boot-strapped goals—albeit human pro-social ones.
The ultimate goal of the agent is minimize the intrinsic cost over the long run. This is where basic behavioral drives and intrinsic motivations reside. The design of the intrinsic cost module determines the nature of the agent’s behavior. Basic drives can be hard-wired in this module. This may include feeling \good” (low energy) when standing up to motivate a legged robot to walk, when influencing the state of the world to motivate agency, when interacting with humans to motivate social behavior, when perceiving joy in nearby humans to motivate empathy, when having a full energy supplies (hunger/satiety), when experiencing a new situation to motivate curiosity and exploration, when fulfilling a particular program, etc
I would say that my question—which I did not answer in the post—is whether we can design AIs that don’t seek to maximize some utility or minimize some cost? What would that look like? Some computer-cluster just spinning up to do computations for no effective purpose?
I don’t have the time—or knowledge—to respond to everything, but from your response, I worry that my article partially missed the target. I’m trying to argue that humans may not be just—utility—maximizers and that a large part of being human (or maybe any organism?) is to just enjoy the world via some quasi-non-rewarded types of behavior. So there’s no real utility for some or perhaps the most important things that we value. Seeking out “surprising” results does help AIs and humans learn, and seeking out information as well. But I’m not sure human psychology supports human intrinsic rewards as necessarily related to utility maximization. I do view survival nor procreation as genetically encoded drives—but they are not the innate drives I described above. It’s not completely clear what we gain when we enjoy being in the world, learning, socializing.
Let me rephrase your thought, as I understand it: “I don’t think humans are (pure) RL-like agents, they are more like ActInf agents” (by “pure” RL I mean RL without entropy regularization, or other schemes that motivate exploration).
There is copious literature finding the neuronal, neuropsychological, or psychological makeup of humans “basically implementing Active Inference”, as well as “basically implementing RL”. The portion of this research that is more rigorous maps the empirical observations from neurobiology directly onto the mathematics of ActInf and RL, respectively. I think this kind of research is useful, it equips us with instruments to predict certain aspects of human behaviour, and suggests avenues for disorder treatment.
The portion of this research that is less rigorous and more philosophical, is like pointing out “it looks like humans behave here like ActInf agents”, or “it looks like humans behave here like RL agents”. This kind of philosophy is only useful for suggesting a direction for mining empirical observations, to either confirm or disprove theories that in this or that corner of behaviour/psychology, humans act more like ActInf, or RL agents. (Note that I would not count observations from psychology here, because they are notoriously unreliable themselves, see reproducibility crisis, etc.)
I’m aware of Friston’s free energy principle (it was one of the first things I looked at in graduate school). I personally view most of it as non-falsifiable, but I know that many have used to derive useful interpretation of brain function.
RL is not falsifiable, too. Both can be seen as normative theories of agency. Normative theories are unfalsifiable, they are prescriptions, or, if you want, the sources of the definition of agency.
However, I would say that ActInf is also a physical theory (apart from being normative) because it’s derived from (or at least related to) statistical mechanics and the principle of least action. RL is “just” a normative framework of agency because I don’t see any relationship with physics in it (again, if you don’t add entropy regularisation).
I would say that my question—which I did not answer in the post—is whether we can design AIs that don’t seek to maximize some utility or minimize some cost?
I answered to this question above: yes, you can design AI that will not minimise or maximise any utility or cost, but only some form of energy. Just choose Active Inference, ReduNet, GFlowNet, or LeCun’s architecture[1]. It’s not just renaming “utility” into “energy”, there is a deep philosophical departure. (I’m not sure it’s articulated somewhere in a piece dedicated to this question, the best resources that I can recommend are the sections which discuss RL in Active Inference book, LeCun’s paper (see section “Reward is not enough”), and Bengio’s GFlowNet tutorial, all links are above.
However, as I pointed out above, this doesn’t save you from instrumental convergence. Which can be just as bad (for humans) as a prototypical utility/cost/paperclip maximiser.
If you want an agent that doesn’t instrumentally converge at all, please see the discussion of Mild Optimization.
Caveats apply: embedded agents could still emerge inside agents with these architectures, and these embedded agents might in principle be RL. Perhaps, this is actually why humans sometimes exhibit RL-like behaviour, even though “fundamentally” they are more like ActInf agents.
Maximisation of explicit reward is the defining feature of Reinforcement Learning, but this is just one of many agentic intelligence architectures (see alternatives listed here). The architectures that I mention below in this comment: Active Inference, ReduNets, and GFlowNets, all use intrinsic motivation. And modern Reinforcement Learning with Entropy Regularization also models intrinsic motivation. Therefore, no, intrinsic motivation is not a sign of looming AGI. In principle, agents on any scale and capability level can have it.
See Friston et al. “Active inference and epistemic value” (2015):
Also, the maximisation of information gain (aka epistemic value, Bayesian surprise, intrinsic motivation, optimal (Bayesian) experimental design, and infomax principle), is discussed a lot in the more recent book Active Inference (Parr, Pezzulo, and Friston 2022), especially in Chapters 2 and 10. The book is available online for free. The exploration-exploitation tradeoff, as well as the relationships between Active Inference with decision-making and other (agentic) intelligence frameworks are discussed.
A normative theory of agency alternative to Active Inference, the Principle of Maximizing Rate Reduction, can also be viewed as a generalisation of information gain (Chan, Yu, You et al. 2022). See also this recent workshop with Ma and Friston (the masterminds behind ReduNets and Active Inference, respectively).
Yet another framework suitable for AI agents, GFlowNet, also doesn’t use the notion of reward maximisation and instead, fits the “reward function”, which is conceptually very similar to Active Inference agents minimising their expected free energy.
If by “internal reward” you mean “intrinsically determined preferences/goals”, then Active Inference operationalises it, too, as prior preferences, that can be learned just as everything else. Which is the answer to your question “Might AIs develop innate needs that go beyond receiving external rewards and maximizing utility?”
Neither Active Inference nor ReduNets nor GFlowNets nor LeCun’s architecture “maximise utility”, but they still all instrumentally converge. Instrumental convergence basically approximately equals capability (fitness).
Hi Roman.
First of all, thank you so much for reading and taking the time to respond.
I don’t have the time—or knowledge—to respond to everything, but from your response, I worry that my article partially missed the target. I’m trying to argue that humans may not be just—utility—maximizers and that a large part of being human (or maybe any organism?) is to just enjoy the world via some quasi-non-rewarded types of behavior. So there’s no real utility for some or perhaps the most important things that we value. Seeking out “surprising” results does help AIs and humans learn, and seeking out information as well. But I’m not sure human psychology supports human intrinsic rewards as necessarily related to utility maximization. I do view survival and procreation as genetically encoded drives—but they are not the innate drives I described above. It’s not completely clear what we gain when we enjoy being in the world, learning, socializing.
I’m aware of Friston’s free energy principle (it was one of the first things I looked at in graduate school). I personally view most of it as non-falsifiable, but I know that many have used to derive useful interpretation of brain function.
Also I quickly googled LeCun’s proposal, and his conception of future AI, and his intrinsic motivation module is largely about boot-strapped goals—albeit human pro-social ones.
The ultimate goal of the agent is minimize the intrinsic cost over the long run. This is where basic behavioral drives and intrinsic motivations reside. The design of the intrinsic cost module determines the nature of the agent’s behavior. Basic drives can be hard-wired in this module. This may include feeling \good” (low energy) when standing up to motivate a legged robot to walk, when influencing the state of the world to motivate agency, when interacting with humans to motivate social behavior, when perceiving joy in nearby humans to motivate empathy, when having a full energy supplies (hunger/satiety), when experiencing a new situation to motivate curiosity and exploration, when fulfilling a particular program, etc
I would say that my question—which I did not answer in the post—is whether we can design AIs that don’t seek to maximize some utility or minimize some cost? What would that look like? Some computer-cluster just spinning up to do computations for no effective purpose?
I don’t really have an answer here.
Let me rephrase your thought, as I understand it: “I don’t think humans are (pure) RL-like agents, they are more like ActInf agents” (by “pure” RL I mean RL without entropy regularization, or other schemes that motivate exploration).
There is copious literature finding the neuronal, neuropsychological, or psychological makeup of humans “basically implementing Active Inference”, as well as “basically implementing RL”. The portion of this research that is more rigorous maps the empirical observations from neurobiology directly onto the mathematics of ActInf and RL, respectively. I think this kind of research is useful, it equips us with instruments to predict certain aspects of human behaviour, and suggests avenues for disorder treatment.
The portion of this research that is less rigorous and more philosophical, is like pointing out “it looks like humans behave here like ActInf agents”, or “it looks like humans behave here like RL agents”. This kind of philosophy is only useful for suggesting a direction for mining empirical observations, to either confirm or disprove theories that in this or that corner of behaviour/psychology, humans act more like ActInf, or RL agents. (Note that I would not count observations from psychology here, because they are notoriously unreliable themselves, see reproducibility crisis, etc.)
RL is not falsifiable, too. Both can be seen as normative theories of agency. Normative theories are unfalsifiable, they are prescriptions, or, if you want, the sources of the definition of agency.
However, I would say that ActInf is also a physical theory (apart from being normative) because it’s derived from (or at least related to) statistical mechanics and the principle of least action. RL is “just” a normative framework of agency because I don’t see any relationship with physics in it (again, if you don’t add entropy regularisation).
I answered to this question above: yes, you can design AI that will not minimise or maximise any utility or cost, but only some form of energy. Just choose Active Inference, ReduNet, GFlowNet, or LeCun’s architecture[1]. It’s not just renaming “utility” into “energy”, there is a deep philosophical departure. (I’m not sure it’s articulated somewhere in a piece dedicated to this question, the best resources that I can recommend are the sections which discuss RL in Active Inference book, LeCun’s paper (see section “Reward is not enough”), and Bengio’s GFlowNet tutorial, all links are above.
However, as I pointed out above, this doesn’t save you from instrumental convergence. Which can be just as bad (for humans) as a prototypical utility/cost/paperclip maximiser.
If you want an agent that doesn’t instrumentally converge at all, please see the discussion of Mild Optimization.
Caveats apply: embedded agents could still emerge inside agents with these architectures, and these embedded agents might in principle be RL. Perhaps, this is actually why humans sometimes exhibit RL-like behaviour, even though “fundamentally” they are more like ActInf agents.