jacob_cannell comments on Thought Experiments Provide a Third Anchor

jacob_cannell 24 Jan 2022 22:39 UTC
10 points
Thanks for the organized reply, i’ll try to keep the same format.

Intrinsic motivation / curiosity:

You are familiar with the serotogenic and dopaminergic pathways and associated learning systems—typically simplified to an unsupervised learning component and a reward learning component.

My main point is that picture is incomplete/incorrect, and the brain’s main learning system involves some form of empowerment. Curiosity is typically formulated as improvement in prediction capability, so it’s like a derivative of more standard unsupervised learning (and thus probably a component of that system). But that alone isn’t so great at learning for the roughly half the brain involved in action/motor/decision/planning. Some form of ‘empowerment’ criteria—specifically maximization of mutual information between actions and future world state (or observations, but the former is probably better) is a more robust general learning signal for action learning, and seems immune to the problems that plague pure curiosity approaches like the rule 101 type issues you mention.

For example: dopamine release on winning a bet has nothing to do with innate drives, it’s purely an empowerment type learning signal. This is actually just the normal learning system at work.

The brain is mostly explained by this core learning system (which perhaps has just two or three main components). The innate drives (hunger,thirst,comfort/pain,sex,etc) are completely insufficient as signals for training the brain. They are instead satisficing drives that quickly saturate. They are secondary learning signals, but moreover they also can directly control/influence behavior in key situations, like the emotional subsystems. (Naturally there are exceptions to typical saturation—humans with a mutation causing perpetual unsatisfiable deep hunger and thus think about food all day long)

Empowerment that operates over learned world state also could support easy modulation—for example by up-weighting the importance of modeling humans/agents.

The altruism/empathic component isn’t really like those innate drives (it’s not really satisfying/saturating), and so instead is more core, part of the primary utility function and learning systems. (And also probably involves it’s own neuromodulator component through oxytocin).

I think that intrinsic motivation in both humans & AGIs needs to be supplemented by a “drive to pay attention to humans”, which in humans is based on superficial things like an innate brainstem circuit that disproportionately fires when hearing human speech.

Human infants grow up around humans who spend a large amount of time talking near the child. It’s actually a dominant component of the audio landscape human infants grow up in. Any reasonably competent UL system will learn a model of human speech just from this training data (and ML systems prove this). Any innate human-speech brainstem circuit is of secondary importance—perhaps it speeds up learning a bit (like the simple brainstem face detector that helps prime the cortex), but it simply can not be necessary—as that would be incompatible with everything we know about the powerful universal learning capability of the brain.

Then once the brain has learned a recognition model of human speech, empowerment based learning is completely sufficient to learn speech production motor skills, simply by learning to maximize the mutual information between larynx motor actions and future predicted human speech audio world state. Again the brain may use some tricks to speed up learning, but the universal learning system is doing all the heavy lifting.

We also disagree about “drive for having high social status / impressing my friends”: You think it’s purely a special case of “intrinsic motivation” and thus requires no further explanation,

Once a child has learned a model of other humans—parents, friends, general models of other ‘kids’, etc, the empowerment system naturally then tries to learn ways to control these agents. This is so difficult that it basically drives a huge chunk of subsequent learning for most people, and becomes social theory of mind and innate ‘game theory’. Social status is simply a proxy measure for influence, so it’s closely correlated—or even just the same as—maximization of mutual info between actions and future agent beliefs (ie empowerment). If you think of what the word influence means, it’s actually just a definition of a specific form of empowerment.

Other low-level drives:

The ancient innate Satisficing drives are what I think of as the low-level drive category (hunger,thirst,pain,sex,etc).

And finally the core emotions (happiness, sadness, fear, anger) are a third category. They are ancient subsystems that are both behavioral triggers and learning modulators. Happiness/sadness are just manifestations of predicted utility, whereas fear and anger are innate high-stress behavior modes (flight and fight responses). Humans then inherit more complex triggers—such as the injustice/righteousness triggers for anger, and more complex derived emotions.

I would put altruism/empathy in its own category, although it’s also obviously closely connected to the emotion of love. Implementation wise it results in mixing of the learned utility functions of external agents into the agent’s own root utility function. It is essentially evolved alignment. There are good reasons for this to evolve—basically shared genes and disposable somas, and we’ll want something similar in AGI. It’s a social component in the sense that it needs to connect the learned models of external agents to the core utility function.

I think we agree that AGI can have some or all of those human social instincts, but only if the AGI designers put them in, which would require (1) more research to nail down exactly how they’re implemented, (2) advocacy etc.

We want to align AGI, and the brain’s empathic/altruistic system could show us a practical way to achieve that. I don’t see much role for the other emotional circuitry or innate drives. So we mostly agree here except you seem more interested in various ‘social instincts’ beyond just empathy/altruism (alignment).

Where does that leave anthropomorphism?

I believe humans (and more specifically high-impact humans) are mostly explained by a universal/generic learning system optimizing for a few things: mainly some mix of empowerment, curiosity, and altruism/empathy. There are many other brain systems (innate drives, emotions, etc), but they aren’t so relevant.

I also believe brains are efficient, and thus AGI will end up being brain like—specifically it will also be mostly understandable as a universal neural learning system optimizing for some mix of empowerment, curiosity, and altruism/empathy or equivalents. There may be some other components, but they aren’t as important.

Goals and values are complex learned concepts. Initial AGI will not reinvent all of human cultural history, and instead will just absorb human values—as they emerge from a universal learning system training on human world experience data, and AGI will have a similar universal learning system and similar experience training data. This doesn’t imply AGI will have the exact same values of some typical mix of humans. Only that it’s values will be mostly sampled from within the wide human-set.

From the original comment I was replying to (from Jon Garcia, not you):

There is no reason to think that the first AGIs will have goal/value structures any less alien to humans than would a superintelligent spider

There are deep reasons to believe AGI will be more anthropomorphic than not—mostly created in the image of humans. AGI will be much closer to a human mind than some hypothetical superintelligent spider.
What links here?
- interstice's comment on Why No *Interesting* Unaligned Singularity? by David Udell (20 Apr 2022 2:24 UTC; 9 points)