I think I’m less optimistic than you about the formal notion of empowerment being helpful for this third bullet point, or being what we want an AGI to be maximizing for us humans. For one thing, wouldn’t we still need “correlation guided proxy matching”?
I debated what word to best describe the general category of all self-motivated long term convergence approximators, and chose ‘empowerment’ rather than ‘self-motivation’, but tried to be clear that I’m pointing at a broad category. The defining characteristic is that optimizing for empowerment should be the same as optimizing for any reasonably mix of likely long term goals, due to convergence (and that is in fact one of the approx methods). Human intelligence requires empowerment, as will AGI—it drives active learning, self exploration, play, etc (consider the appeal of video games). I’m not confident in any specific formalization as being ‘the one’ at this point, let alone the ideal approximations.
Broad empowerment is important to any model of other external humans/agents, and so yes the proxy matching is still relevant there. Since empowerment is universal and symmetric, the agent’s own self-empowerment model could be used as a proxy with simulation. For example humans don’t appear to innately understand and fear death, but can develop a great fear of it when learning of their own mortality—which is only natural as it is maximally disempowering. Then simulating others as oneself helps learn that others also fear death and ground that sub-model. Something similar could work for the various other aspects of empowerment.
(Or is there a collective notion of empowerment?)
Yeah an agent aligned to the empowerment of multiple others would need to aggregate utilities ala some approx VCG mechanism but that’s no different than for other utility function components.
Here’s another example: if the AGI comes up with 10,000 possible futures of the universe, and picks one to bring about based on how many times I blink in the next hour, then I am highly “empowered”
From what I recall for the info-max formulations the empowerment of a state is a measure over all actions the agent could take in that state, not just one arbitrary action, and I think it discounts for decision entropy (random decisions are not future correlated). So in that example the AGI would need to carefully evaluate each of the 10,000 possible futures, and consider all your action paths in each future, and the complexity of the futures dependent on those action options, to pick the future that has the most future future optionality. No, it’s not practically computable, but neither is the shortest description of inference (solomonoff) or intelligence either, so it’s about efficient approximations. There’s likely still much to learn from the brain there.
An AGI optimizing for your empowerment would likely just make you wealthy and immortal, and leave your happiness up to your own devices. However it would also really really not want to let you die, which could be a problem in some cases if pain/suffering was still a problem (although it would also seek to eliminate your pain/suffering to the extent that interferes with your future optionality, and there is some edge case risk it would have incentives to alter some parts our value system, like if it determines some emotions constraint our future optionality).
“Correlation guided proxy matching” needs a “proxy”, including a proxy computable for an AGI in the real world (because after we’re feeling good about the simbox results, we still need to re-train the AGI in the real world, right?). We can argue about whether the “proxy” should ideally be a proxy to VCG-aggregated human empowerment, versus a proxy to human happiness or flourishing or whatever. But that’s a bit beside the point until we address the question: How do we calculate that proxy? Do you think that we can write code today to calculate this proxy, and then we can go walk around town and see what that code spits out in different real-world circumstances? Or if we can’t write such code today, why not, and what line of research gets us to a place where we can write such code?
But that’s a bit beside the point until we address the question: How do we calculate that proxy?
I currently see two potential paths, which aren’t mutually exclusive.
The first path is to reverse engineer the brain’s empathy system. My current rough guess of how the proxy-matching works for that is explained in some footnotes in section 4, and I’ve also re-written out in this comment which is related to some of your writings. In a nutshell the oldbrain has a complex suite of mechanisms (facial expressions, gaze, voice tone, mannerisms, blink rate, pupil dilation, etc) consisting of both subconscious ‘tells’ and ‘detectors’ that function as a sort of direct non-verbal, oldbrain to oldbrain communication system to speed up the grounding to newbrain external agent models. This is the basis of empathy, evolved first for close kin (mothers simulating infant needs, etc) then extended and generalized. I think this is what you perhaps have labeled innate ‘social instincts’ - these facilitate grounding to the newbrain models of other’s emotions/values.
The second path is to use introspection/interpretability tools to more manually locate learned models of external agents (and their values/empowerment/etc), and then extract those located circuits and use them directly as proxies in the next agent.
Do you think that we can write code today to calculate this proxy, and then we can go walk around town and see what that code spits out in different real-world circumstances? Or if we can’t write such code today, why not, and what line of research gets us to a place where we can write such code?
Neuroscientists may already be doing some of this today, or at least they could (I haven’t extensively researched this yet). Should be able to put subjects in brain scanners and ask them to read and imagine emotional scenarios that trigger specific empathic reactions, perhaps have them make consequent decisions, etc.
And of course there is some research being done on empathy in rats, some of which I linked to in the article.
Naturalistic Stimuli in Affective Neuroimaging: A Review
Naturalistic stimuli such as movies, music, and spoken and written stories elicit strong emotions and allow brain imaging of emotions in close-to-real-life conditions. Emotions are multi-component phenomena: relevant stimuli lead to automatic changes in multiple functional components including perception, physiology, behavior, and conscious experiences. Brain activity during naturalistic stimuli reflects all these changes, suggesting that parsing emotion-related processing during such complex stimulation is not a straightforward task. Here, I review affective neuroimaging studies that have employed naturalistic stimuli to study emotional processing, focusing especially on experienced emotions. I argue that to investigate emotions with naturalistic stimuli, we need to define and extract emotion features from both the stimulus and the observer.
An Integrative Way for Studying Neural Basis of Basic Emotions With fMRI
How emotions are represented in the nervous system is a crucial unsolved problem in the affective neuroscience. Many studies are striving to find the localization of basic emotions in the brain but failed. Thus, many psychologists suspect the specific neural loci for basic emotions, but instead, some proposed that there are specific neural structures for the core affects, such as arousal and hedonic value. The reason for this widespread difference might be that basic emotions used previously can be further divided into more “basic” emotions. Here we review brain imaging data and neuropsychological data, and try to address this question with an integrative model. In this model, we argue that basic emotions are not contrary to the dimensional studies of emotions (core affects). We propose that basic emotion should locate on the axis in the dimensions of emotion, and only represent one typical core affect (arousal or valence). Therefore, we propose four basic emotions: joy-on positive axis of hedonic dimension, sadness-on negative axis of hedonic dimension, fear, and anger-on the top of vertical dimensions. This new model about basic emotions and construction model of emotions is promising to improve and reformulate neurobiological models of basic emotions.
I debated what word to best describe the general category of all self-motivated long term convergence approximators, and chose ‘empowerment’ rather than ‘self-motivation’, but tried to be clear that I’m pointing at a broad category. The defining characteristic is that optimizing for empowerment should be the same as optimizing for any reasonably mix of likely long term goals, due to convergence (and that is in fact one of the approx methods). Human intelligence requires empowerment, as will AGI—it drives active learning, self exploration, play, etc (consider the appeal of video games). I’m not confident in any specific formalization as being ‘the one’ at this point, let alone the ideal approximations.
Broad empowerment is important to any model of other external humans/agents, and so yes the proxy matching is still relevant there. Since empowerment is universal and symmetric, the agent’s own self-empowerment model could be used as a proxy with simulation. For example humans don’t appear to innately understand and fear death, but can develop a great fear of it when learning of their own mortality—which is only natural as it is maximally disempowering. Then simulating others as oneself helps learn that others also fear death and ground that sub-model. Something similar could work for the various other aspects of empowerment.
Yeah an agent aligned to the empowerment of multiple others would need to aggregate utilities ala some approx VCG mechanism but that’s no different than for other utility function components.
From what I recall for the info-max formulations the empowerment of a state is a measure over all actions the agent could take in that state, not just one arbitrary action, and I think it discounts for decision entropy (random decisions are not future correlated). So in that example the AGI would need to carefully evaluate each of the 10,000 possible futures, and consider all your action paths in each future, and the complexity of the futures dependent on those action options, to pick the future that has the most future future optionality. No, it’s not practically computable, but neither is the shortest description of inference (solomonoff) or intelligence either, so it’s about efficient approximations. There’s likely still much to learn from the brain there.
An AGI optimizing for your empowerment would likely just make you wealthy and immortal, and leave your happiness up to your own devices. However it would also really really not want to let you die, which could be a problem in some cases if pain/suffering was still a problem (although it would also seek to eliminate your pain/suffering to the extent that interferes with your future optionality, and there is some edge case risk it would have incentives to alter some parts our value system, like if it determines some emotions constraint our future optionality).
OK thanks. Hmm, maybe a better question would be:
“Correlation guided proxy matching” needs a “proxy”, including a proxy computable for an AGI in the real world (because after we’re feeling good about the simbox results, we still need to re-train the AGI in the real world, right?). We can argue about whether the “proxy” should ideally be a proxy to VCG-aggregated human empowerment, versus a proxy to human happiness or flourishing or whatever. But that’s a bit beside the point until we address the question: How do we calculate that proxy? Do you think that we can write code today to calculate this proxy, and then we can go walk around town and see what that code spits out in different real-world circumstances? Or if we can’t write such code today, why not, and what line of research gets us to a place where we can write such code?
Sorry if I’m misunderstanding :)
I currently see two potential paths, which aren’t mutually exclusive.
The first path is to reverse engineer the brain’s empathy system. My current rough guess of how the proxy-matching works for that is explained in some footnotes in section 4, and I’ve also re-written out in this comment which is related to some of your writings. In a nutshell the oldbrain has a complex suite of mechanisms (facial expressions, gaze, voice tone, mannerisms, blink rate, pupil dilation, etc) consisting of both subconscious ‘tells’ and ‘detectors’ that function as a sort of direct non-verbal, oldbrain to oldbrain communication system to speed up the grounding to newbrain external agent models. This is the basis of empathy, evolved first for close kin (mothers simulating infant needs, etc) then extended and generalized. I think this is what you perhaps have labeled innate ‘social instincts’ - these facilitate grounding to the newbrain models of other’s emotions/values.
The second path is to use introspection/interpretability tools to more manually locate learned models of external agents (and their values/empowerment/etc), and then extract those located circuits and use them directly as proxies in the next agent.
Neuroscientists may already be doing some of this today, or at least they could (I haven’t extensively researched this yet). Should be able to put subjects in brain scanners and ask them to read and imagine emotional scenarios that trigger specific empathic reactions, perhaps have them make consequent decisions, etc.
And of course there is some research being done on empathy in rats, some of which I linked to in the article.
I found two studies that seem relevant:
Naturalistic Stimuli in Affective Neuroimaging: A Review
https://www.frontiersin.org/articles/10.3389/fnhum.2021.675068/full
An Integrative Way for Studying Neural Basis of Basic Emotions With fMRI
https://www.frontiersin.org/articles/10.3389/fnins.2019.00628/full
Oh neat, sounds like we mostly agree then. Thanks. :)
See the studies listed above.