Steven Byrnes comments on A case for AI alignment being difficult

Steven Byrnes 2 Jan 2024 19:55 UTC
LW: 4 AF: 3
0
AF
Eliezer proposes “boredom” as an example of a human value (which could either be its own shard or a term in the utility function). I don’t think this is a good example. It’s fairly high level and is instrumental to other values.
Can you elaborate on “is instrumental to other values”? Here’s why I find that confusing:
- From the perspective of evolution, everything (from friendship to pain aversion) “is instrumental” to inclusive genetic fitness.
- From the perspective of within-lifetime learning algorithms, I don’t think boredom is instrumental to other stuff. I think humans find boredom inherently demotivating, i.e. it’s its own (negative) reward, i.e. boredom is pretty directly “a term in the human brain reward function”, so to speak, one that’s basically part of curiosity drive (where curiosity drive is well-known in the RL literature and I think it’s part of RL-in-the-human-brain-as-designed-by-evolution too). (Maybe you’re disagreeing with me on that though? I acknowledge that my claim in this bullet point is not trivially obvious.)
- jessicata 2 Jan 2024 20:50 UTC
  LW: 6 AF: 3
  2
  AF Parent
  From a within-lifetime perspective, getting bored is instrumentally useful for doing “exploration” that results in finding useful things to do, which can be economically useful, be effective signalling of capacity, build social connection, etc. Curiosity is partially innate but it’s also probably partially learned. I guess that’s not super different from pain avoidance. But anyway, I don’t worry about an AI that fails to get bored, but is otherwise basically similar to humans, taking over, because not getting bored would result in being ineffective at accomplishing open-ended things.
  - Steven Byrnes 3 Jan 2024 21:45 UTC
    LW: 5 AF: 3
    2
    AF Parent
    From a within-lifetime perspective, getting bored is instrumentally useful for doing “exploration” that results in finding useful things to do, which can be economically useful, be effective signalling of capacity, build social connection, etc.
    Maybe fear-of-heights is a clearer example.
    You can say “From a within-lifetime perspective, fear-of-heights is instrumentally useful because if you fall off a cliff and die then you can’t accomplish anything else.” But that’s NOT the story of why (from a within-lifetime perspective) the fear-of-heights is there. It’s there because it’s innate—we’re born with it, and we would be afraid of heights even if we grew up in an environment where fear-of-heights is not instrumentally useful. And separately, the reason we’re born with it is that it’s instrumentally useful from an evolutionary perspective. Right?
    Curiosity is partially innate but it’s also probably partially learned
    Sure. I agree.
    But anyway, I don’t worry about an AI that fails to get bored, but is otherwise basically similar to humans, taking over, because not getting bored would result in being ineffective at accomplishing open-ended things.
    Hmm, I kinda think the opposite. I think if you were making “an AI basically similar to humans”, and just wanted to maximize its capabilities leaving aside alignment, you would give it innate intrinsic boredom during “childhood”, but you would make that drive gradually fade to zero over time, because eventually the AI will develop learned metacognitive strategies that accomplish the same things that boredom would accomplish, but better (more flexible, more sophisticated, etc.). I was just talking about this in this thread (well, I was talking about curiosity rather than boredom, but that’s two sides of the same coin).
    - jessicata 3 Jan 2024 21:53 UTC
      LW: 4 AF: 2
      0
      AF Parent
      There are evolutionary priors for what to be afraid of but some of it is learned. I’ve heard children don’t start out fearing snakes but will easily learn to if they see other people afraid of them, whereas the same is not true for flowers (sorry, can’t find a ref, but this article discusses the general topic). Fear of heights might be innate but toddlers seem pretty bad at not falling down stairs. Mountain climbers have to be using mainly mechanical reasoning to figure out which heights are actually dangerous. It seems not hard to learn the way in which heights are dangerous if you understand the mechanics required to walk and traverse stairs and so on.
      
      Instincts like curiosity are more helpful at the beginning of life, over time they can be learned as instrumental goals. If an AI learns advanced metacognitive strategies instead of innate curiosity that’s not obviously a big problem from a human values perspective but it’s unclear.
      - Steven Byrnes 4 Jan 2024 14:40 UTC
        LW: 4 AF: 3
        0
        AF Parent
        Some of this is my opinion rather than consensus, but in case you’re interested:
        I believe that the human brainstem (superior colliculus) has an innate detector of certain specific visual things including slithering-like-a-snake and scuttling-like-a-spider, and when it detects those things, it executes an “orienting reaction” which involves not only eye-motion and head-turns but also conscious attention, and it also induces physiological arousal (elevated heart-rate etc.). That physiological arousal is not itself fear—obviously we experience physiological arousal in lots of situations that are not fear, like excitement, anger, etc.—but the arousal and attention does set up a situation in which a fear-response can be very easily learned. (Various brain learning algorithms are also doing various other things in the meantime, such that adults can wind up with that innate response getting routinely suppressed.)
        My experience is that stairs don’t trigger fear-of-heights too much because you’re not looking straight down off a precipice. Also, I think sufficiently young babies don’t have fear-of-heights? I forget.
        I’m not making any grand point here, just chatting.