Roman Leventov comments on AGIs may value intrinsic rewards more than extrinsic ones

Roman Leventov 18 Nov 2022 8:40 UTC
1 point
0
I don’t have the time—or knowledge—to respond to everything, but from your response, I worry that my article partially missed the target. I’m trying to argue that humans may not be just—utility—maximizers and that a large part of being human (or maybe any organism?) is to just enjoy the world via some quasi-non-rewarded types of behavior. So there’s no real utility for some or perhaps the most important things that we value. Seeking out “surprising” results does help AIs and humans learn, and seeking out information as well. But I’m not sure human psychology supports human intrinsic rewards as necessarily related to utility maximization. I do view survival nor procreation as genetically encoded drives—but they are not the innate drives I described above. It’s not completely clear what we gain when we enjoy being in the world, learning, socializing.
Let me rephrase your thought, as I understand it: “I don’t think humans are (pure) RL-like agents, they are more like ActInf agents” (by “pure” RL I mean RL without entropy regularization, or other schemes that motivate exploration).
There is copious literature finding the neuronal, neuropsychological, or psychological makeup of humans “basically implementing Active Inference”, as well as “basically implementing RL”. The portion of this research that is more rigorous maps the empirical observations from neurobiology directly onto the mathematics of ActInf and RL, respectively. I think this kind of research is useful, it equips us with instruments to predict certain aspects of human behaviour, and suggests avenues for disorder treatment.
The portion of this research that is less rigorous and more philosophical, is like pointing out “it looks like humans behave here like ActInf agents”, or “it looks like humans behave here like RL agents”. This kind of philosophy is only useful for suggesting a direction for mining empirical observations, to either confirm or disprove theories that in this or that corner of behaviour/psychology, humans act more like ActInf, or RL agents. (Note that I would not count observations from psychology here, because they are notoriously unreliable themselves, see reproducibility crisis, etc.)
I’m aware of Friston’s free energy principle (it was one of the first things I looked at in graduate school). I personally view most of it as non-falsifiable, but I know that many have used to derive useful interpretation of brain function.
RL is not falsifiable, too. Both can be seen as normative theories of agency. Normative theories are unfalsifiable, they are prescriptions, or, if you want, the sources of the definition of agency.
However, I would say that ActInf is also a physical theory (apart from being normative) because it’s derived from (or at least related to) statistical mechanics and the principle of least action. RL is “just” a normative framework of agency because I don’t see any relationship with physics in it (again, if you don’t add entropy regularisation).
I would say that my question—which I did not answer in the post—is whether we can design AIs that don’t seek to maximize some utility or minimize some cost?
I answered to this question above: yes, you can design AI that will not minimise or maximise any utility or cost, but only some form of energy. Just choose Active Inference, ReduNet, GFlowNet, or LeCun’s architecture^[1]. It’s not just renaming “utility” into “energy”, there is a deep philosophical departure. (I’m not sure it’s articulated somewhere in a piece dedicated to this question, the best resources that I can recommend are the sections which discuss RL in Active Inference book, LeCun’s paper (see section “Reward is not enough”), and Bengio’s GFlowNet tutorial, all links are above.
However, as I pointed out above, this doesn’t save you from instrumental convergence. Which can be just as bad (for humans) as a prototypical utility/cost/paperclip maximiser.
If you want an agent that doesn’t instrumentally converge at all, please see the discussion of Mild Optimization.
1. ^
  Caveats apply: embedded agents could still emerge inside agents with these architectures, and these embedded agents might in principle be RL. Perhaps, this is actually why humans sometimes exhibit RL-like behaviour, even though “fundamentally” they are more like ActInf agents.