Rafael Harth comments on Inner alignment in the brain

Rafael Harth 28 Nov 2020 15:25 UTC
4 points
I’m still in the process of understanding this stuff, so apologies in advance if I’m saying something stupid.

There seems to be an important misanalogy here between inner alignment in AGI and in the brain. In both cases, it’s about the difference between the inner and the outer objective, but in the AGI case, we want to make sure the inner objective is as close to the outer objective as possible, whereas in the brain, we want to make sure the outer objective doesn’t corrupt the inner objective.

I guess this makes sense: we’re misaligned from the perspective of evolution, but from our perspective, well we are the inner optimizers, so we want them to ‘win’. Conversely, if we build AI, our PoV is analogous to that of evolution, so we want the outer objective to come out on top.

The neocortex’s algorithm, as I understand it, sorta learns patterns, and patterns in the patterns, etc., and each pattern is represented as an essentially randomly-generated[5] set of neurons in the neocortex.

Are these ‘patterns’ the same as the generative models? And does ‘randomly generated’ mean that, if I learn a new pattern, my neocortex generates a random set of neurons that is then associated with that pattern from that point onward?

(I also put “pain concept” in the neocortex, again following Barrett. A giant part of the pain concept is nociception—detecting the incoming nerve signals we might call “pain sensations”. But at the end of the day, the neocortex gets to decide whether or not to classify a situation as “pain”, based on not only nociception but also things like context and valence.)

So, if a meditator says that they have mastered mindfulness to the point that they can experience pain without suffering, your explanation of that (provided you believe the claim) would be, they have reprogrammed their neocortex such that it no longer classifies the generally-pain-like signals from the subcotex as pain?

Not sure what ‘valence’ is doing here. Isn’t valence an output of what the neocortex does, or does it mean something else in the context of neuroscience?
- Steven Byrnes 28 Nov 2020 18:40 UTC
  4 points
  Parent
  Thanks! No worries! I’m in the process of understanding this stuff too :-P
  in the AGI case, we want to make sure the inner objective is as close to the outer objective as possible, whereas in the brain, we want to make sure the outer objective doesn’t corrupt the inner objective.
  I’m not sure I agree with the second one. Maybe my later discussion in Mesa-Optimizers vs Steered Optimizers is better:
  You try a new food, and find it tastes amazing! This wonderful feeling is your subcortex sending a steering signal up to your neocortex. All of the sudden, a new goal has been installed in your mind: eat this food again! This is not your only goal in life, of course, but it is a goal, and you might use your intelligence to construct elaborate plans in pursuit of that goal, like shopping at a different grocery store so you can buy that food again.
  It’s a bit creepy, if you think about it!
  “You thought you had a solid identity? Ha!! Fool, you are a puppet! If your neocortex gets dopamine at the right times, all of the sudden you would want entirely different things out of life!”
  Yes I do take the perspective of the inner optimizer, but I have mixed feelings about my goals changing over time as a result of the outer layer’s interventions. Like, if I taste a new food and really like it, that changes my goals, but that’s fine, in fact that’s a delightful part of my life. Whereas, if I thought that reading nihilistic philosophers would carry a risk of making me stop caring about the future, I would be reluctant to read nihilistic philosophers. Come to think of it, neither of those is a hypothetical!
  Are these ‘patterns’ the same as the generative models?
  Yes. Kaj calls (a subset of) them “subagents”, I more typically call them “generative models”, Kurzweil calls them “patterns”, Minsky calls this idea “society of mind”, etc.
  And does ‘randomly generated’ mean that, if I learn a new pattern, my neocortex generates a random set of neurons that is then associated with that pattern from that point onward?
  Yes, that’s my current belief fwiw, although to be clear, I only think it’s random on a micro-scale. On the large scale, for example, patterns in raw visual inputs are going to be mainly stored in the part of the brain that receives raw visual inputs, etc. etc.
  So, if a meditator says that they have mastered mindfulness to the point that they can experience pain without suffering, your explanation of that (provided you believe the claim) would be, they have reprogrammed their neocortex such that it no longer classifies the generally-pain-like signals from the subcotex as pain?
  Sure, but maybe a more everyday example would be a runner pushing through towards the finish line while experiencing runner’s high, or a person eating their favorite spicy food, or whatever. It’s still the same sensors in your body sending signals, but in those contexts you probably wouldn’t describe those signals as “I am in pain right now”.
  As for valence, I was confused about valence when I wrote this, and it’s possible I’m still confused. But I felt less confused after writing Emotional Valence Vs RL Reward: A Video-Game Analogy. I’m still not sure it’s right—just the other day I was thinking that I should have said “positive reward prediction error” instead of “reward” throughout that article. I’m going back and forth on that, not sure.
  - Rafael Harth 29 Nov 2020 10:49 UTC
    5 points
    Parent
    
    As for valence, I was confused about valence when I wrote this, and it’s possible I’m still confused. But I felt less confused after writing Emotional Valence Vs RL Reward: A Video-Game Analogy. I’m still not sure it’s right—just the other day I was thinking that I should have said “positive reward prediction error” instead of “reward” throughout that article. I’m going back and forth on that, not sure.
    
    My prior understanding of valence, which is primarily influenced by the Qualia Research Institute, was as the ontologically fundamental utility function of the universe. The claim is that every slice of experience has an objective quantity (its valence) that measures how pleasant or unpleasant it is. This would locate ‘believing valence is a thing’ as a subset of moral realism.
    
    My assumption after reading this post was that you’re talking about something else, but your Emotional Valence vs. RL Reward post made me think you’re talking about the same thing after all, especially this paragraph:
    
    For example, some people say anger is negative valence, but when I feel righteous anger, I like having that feeling, and I want that feeling to continue. (I don’t want to want that feeling to continue, but I do want that feeling to continue!) So by my definition, righteous anger is positive valence!
    
    This sounds to me like you’re talking about how pleasant the state fundamentally is. But then I noticed someone already linked a QRI talk and you didn’t think it was talking about the same thing, so I’m back to being confused.
    
    Are you thinking about valence as something fundamental, or as an abstraction? And if it’s an abstraction, does the term do any work? I guess what confuses me is that the Emotional Valence [...] post makes it sound like you think valence is an important concept, but if you consider it an abstraction, I’m not clear on why. It seems like you could take valence out of the picture and your model of the brain would still work fine.
    
    Anyway, this might not be super important to discuss further, so feel free to leave it here.
    - Steven Byrnes 29 Nov 2020 13:08 UTC
      4 points
      Parent
      I noticed someone already linked a QRI talk and you didn’t think it was talking about the same thing, so I’m back to being confused.
      When I wrote here “Thanks but I don’t see the connection between what I wrote and what they wrote”, I did not mean that QRI was talking about a different phenomenon than I was talking about. I meant that their explanation is wildly different than mine. Re-reading the conversation, I think I was misinterpreting the comment I was replying to; I just went back to edit.
      Needless to say I disagree with the QRI explanation of valence, but there’s such a chasm between their thoughts and my thoughts that it would be challenging for me to try to write a direct rebuttal.
      Again, I do think they’re talking about the same set of phenomena that I’m talking about.
      My prior understanding of valence, which is primarily influenced by the Qualia Research Institute, was as the ontologically fundamental utility function of the universe. The claim is that every slice of experience has an objective quantity (its valence) that measures how pleasant or unpleasant it is. This would locate ‘believing valence is a thing’ as a subset of moral realism.
      I don’t think there’s anything fundamental in the universe besides electrons, quarks, photons, and so on, following their orderly laws as described by the Standard Model of Particle Physics etc. Therefore it follows that there should be an answer to the question “why do people describe a certain state as pleasant” that involves purely neuroscience / psychology and does not involve the philosophy of consciousness or any new ontologically fundamental entities. After all, “describing a certain state as pleasant” is an observable behavioral output of the brain, so it should have a chain of causation that we can trace within the underlying neural algorithms, which in turn follows from the biochemistry of firing neurons and so on, and ultimately from the laws of physics. So, that’s what I was trying to do in that blog post: Trace a chain of causation from “underlying neural algorithms” to “people describing a state as pleasant”.
      After we do that (and I have much more confidence that “a solution exists” than “my particular proposal is the 100% correct solution”) we can ask: is there a further unresolved question of how that exercise we just did (involving purely neuroscience / psychology) relates to consciousness and qualia and whatnot. My answer would be “No. There is nothing left to explain.”, for reasons discussed more at Book Review: Rethinking Consciousness, but I acknowledge that’s a bit counterintuitive and can’t defend it very eloquently, since I haven’t really dived into the philosophical literature.
      - Rafael Harth 29 Nov 2020 14:21 UTC
        4 points
        Parent
        
        When I wrote here “Thanks but I don’t see the connection between what I wrote and what they wrote”, I did not mean that QRI was talking about a different phenomenon than I was talking about. I meant that their explanation is wildly different than mine. Re-reading the conversation, I think I was misinterpreting the comment I was replying to; I just went back to edit.
        
        That makes sense.
        
        I don’t think there’s anything fundamental in the universe besides electrons, quarks, photons, and so on, following their orderly laws as described by the Standard Model of Particle Physics etc. Therefore it follows that there should be an answer to the question “why do people describe a certain state as pleasant” that involves purely neuroscience / psychology and does not involve the philosophy of consciousness or any new ontologically fundamental entities. After all, “describing a certain state as pleasant” is an observable behavioral output of the brain, so it should have a chain of causation that we can trace within the underlying neural algorithms, which in turn follows from the biochemistry of firing neurons and so on, and ultimately from the laws of physics. So, that’s what I was trying to do in that blog post: Trace a chain of causation from “underlying neural algorithms” to “people describing a state as pleasant”.
        
        Ah, but QRI also thinks that the material world is exhaustively described by the laws of physics. I believe they would give a blanket endorsement to everything in the above paragraph except the first sentence. Their view is not that valence is an additional parameter that your model of physics needs to take into consideration to be accurate. Rather, it’s that the existing laws of physics exhaustively describe the future states of particles (so in particular, you can explain the behavior of humans, including their reaction to pain and such, without modeling valence), and the phenomenology can also be described precisely. The framework is dual-aspect monism plus physicalism.
        
        You might still have substantial disagreements with that view, but I as far as I can tell, your posts about neuroscience and even your post on emotional valence are perfectly compatible, except for the one sentence I quoted earlier
        
        the neocortex gets to decide whether or not to classify a situation as “pain”, based on not only nociception but also things like context and valence.)
        
        because it has valence as an input to the neocortex’ decision rather than a property of the output (i.e., if our phenomenology ‘lives’ in the neocortex, then the valence of situation should depend on what the neocortex classifies it as, not vice-versa). And even that could just be using valence to refer to a different thing that’s also real.
        Steven Byrnes 29 Nov 2020 16:16 UTC
        4 points
        Parent
        Ok, thanks for helping me understand. Hmm. I hypothesize the most fundamental part of why I disagree with them is basically what Eliezer talked about here as “explaining” vs.”explaining away”. I think they’re looking for an explanation, i.e. a thing in the world whose various properties match the properties of consciousness and qualia as they seem to us. I’m much more expecting that there is no thing in the world meeting those criteria. Rather I think that this is a case where our perceptions are not neutrally reporting on things in the world, and thus where “the way things seem to us” is different than the way things are.
        
        Or maybe I just narrowly disagree with QRI’s ideas about rhythms and harmony and so on. Not sure. Whenever I try to read QRI stuff it just kinda strikes me as totally off-base, so I haven’t spent much time with it, beyond skimming a couple articles and watching a talk on Connectome-Specific Harmonic Waves on YouTube a few months ago. I’m happy to have your help here :-)
        
        As for valence, yes I think that valence is in an input to the neocortex subsystem (just as vision is an input), although it’s really the neocortex subsystem observing the activity of other parts of the brain, and incidentally those other parts of the brain also depend in part on what the neocortex is doing and has been doing.
        Rafael Harth 29 Nov 2020 18:15 UTC
        5 points
        Parent
        PrincipiaQualia is definitely the thing to read if you want to engage with QRI. It reviews the science and explains the core theory that the research is structured around. I’m not sure if you want to engage with it—I begin from the strong intuition that qualia is real, and so I’m delighted that someone is working on it. My impression is that it makes an excellent case, but my judgment is severely limited since I don’t know the literature. Either way, it doesn’t have a lot of overlap with what you’re working on.
        
        There’s also an AI alignment podcast episode.