“Correlation guided proxy matching” needs a “proxy”, including a proxy computable for an AGI in the real world (because after we’re feeling good about the simbox results, we still need to re-train the AGI in the real world, right?). We can argue about whether the “proxy” should ideally be a proxy to VCG-aggregated human empowerment, versus a proxy to human happiness or flourishing or whatever. But that’s a bit beside the point until we address the question: How do we calculate that proxy? Do you think that we can write code today to calculate this proxy, and then we can go walk around town and see what that code spits out in different real-world circumstances? Or if we can’t write such code today, why not, and what line of research gets us to a place where we can write such code?
But that’s a bit beside the point until we address the question: How do we calculate that proxy?
I currently see two potential paths, which aren’t mutually exclusive.
The first path is to reverse engineer the brain’s empathy system. My current rough guess of how the proxy-matching works for that is explained in some footnotes in section 4, and I’ve also re-written out in this comment which is related to some of your writings. In a nutshell the oldbrain has a complex suite of mechanisms (facial expressions, gaze, voice tone, mannerisms, blink rate, pupil dilation, etc) consisting of both subconscious ‘tells’ and ‘detectors’ that function as a sort of direct non-verbal, oldbrain to oldbrain communication system to speed up the grounding to newbrain external agent models. This is the basis of empathy, evolved first for close kin (mothers simulating infant needs, etc) then extended and generalized. I think this is what you perhaps have labeled innate ‘social instincts’ - these facilitate grounding to the newbrain models of other’s emotions/values.
The second path is to use introspection/interpretability tools to more manually locate learned models of external agents (and their values/empowerment/etc), and then extract those located circuits and use them directly as proxies in the next agent.
Do you think that we can write code today to calculate this proxy, and then we can go walk around town and see what that code spits out in different real-world circumstances? Or if we can’t write such code today, why not, and what line of research gets us to a place where we can write such code?
Neuroscientists may already be doing some of this today, or at least they could (I haven’t extensively researched this yet). Should be able to put subjects in brain scanners and ask them to read and imagine emotional scenarios that trigger specific empathic reactions, perhaps have them make consequent decisions, etc.
And of course there is some research being done on empathy in rats, some of which I linked to in the article.
Naturalistic Stimuli in Affective Neuroimaging: A Review
Naturalistic stimuli such as movies, music, and spoken and written stories elicit strong emotions and allow brain imaging of emotions in close-to-real-life conditions. Emotions are multi-component phenomena: relevant stimuli lead to automatic changes in multiple functional components including perception, physiology, behavior, and conscious experiences. Brain activity during naturalistic stimuli reflects all these changes, suggesting that parsing emotion-related processing during such complex stimulation is not a straightforward task. Here, I review affective neuroimaging studies that have employed naturalistic stimuli to study emotional processing, focusing especially on experienced emotions. I argue that to investigate emotions with naturalistic stimuli, we need to define and extract emotion features from both the stimulus and the observer.
An Integrative Way for Studying Neural Basis of Basic Emotions With fMRI
How emotions are represented in the nervous system is a crucial unsolved problem in the affective neuroscience. Many studies are striving to find the localization of basic emotions in the brain but failed. Thus, many psychologists suspect the specific neural loci for basic emotions, but instead, some proposed that there are specific neural structures for the core affects, such as arousal and hedonic value. The reason for this widespread difference might be that basic emotions used previously can be further divided into more “basic” emotions. Here we review brain imaging data and neuropsychological data, and try to address this question with an integrative model. In this model, we argue that basic emotions are not contrary to the dimensional studies of emotions (core affects). We propose that basic emotion should locate on the axis in the dimensions of emotion, and only represent one typical core affect (arousal or valence). Therefore, we propose four basic emotions: joy-on positive axis of hedonic dimension, sadness-on negative axis of hedonic dimension, fear, and anger-on the top of vertical dimensions. This new model about basic emotions and construction model of emotions is promising to improve and reformulate neurobiological models of basic emotions.
OK thanks. Hmm, maybe a better question would be:
“Correlation guided proxy matching” needs a “proxy”, including a proxy computable for an AGI in the real world (because after we’re feeling good about the simbox results, we still need to re-train the AGI in the real world, right?). We can argue about whether the “proxy” should ideally be a proxy to VCG-aggregated human empowerment, versus a proxy to human happiness or flourishing or whatever. But that’s a bit beside the point until we address the question: How do we calculate that proxy? Do you think that we can write code today to calculate this proxy, and then we can go walk around town and see what that code spits out in different real-world circumstances? Or if we can’t write such code today, why not, and what line of research gets us to a place where we can write such code?
Sorry if I’m misunderstanding :)
I currently see two potential paths, which aren’t mutually exclusive.
The first path is to reverse engineer the brain’s empathy system. My current rough guess of how the proxy-matching works for that is explained in some footnotes in section 4, and I’ve also re-written out in this comment which is related to some of your writings. In a nutshell the oldbrain has a complex suite of mechanisms (facial expressions, gaze, voice tone, mannerisms, blink rate, pupil dilation, etc) consisting of both subconscious ‘tells’ and ‘detectors’ that function as a sort of direct non-verbal, oldbrain to oldbrain communication system to speed up the grounding to newbrain external agent models. This is the basis of empathy, evolved first for close kin (mothers simulating infant needs, etc) then extended and generalized. I think this is what you perhaps have labeled innate ‘social instincts’ - these facilitate grounding to the newbrain models of other’s emotions/values.
The second path is to use introspection/interpretability tools to more manually locate learned models of external agents (and their values/empowerment/etc), and then extract those located circuits and use them directly as proxies in the next agent.
Neuroscientists may already be doing some of this today, or at least they could (I haven’t extensively researched this yet). Should be able to put subjects in brain scanners and ask them to read and imagine emotional scenarios that trigger specific empathic reactions, perhaps have them make consequent decisions, etc.
And of course there is some research being done on empathy in rats, some of which I linked to in the article.
I found two studies that seem relevant:
Naturalistic Stimuli in Affective Neuroimaging: A Review
https://www.frontiersin.org/articles/10.3389/fnhum.2021.675068/full
An Integrative Way for Studying Neural Basis of Basic Emotions With fMRI
https://www.frontiersin.org/articles/10.3389/fnins.2019.00628/full
Oh neat, sounds like we mostly agree then. Thanks. :)
See the studies listed above.