jacob_cannell comments on What does it mean for an AGI to be ‘safe’?

jacob_cannell 7 Oct 2022 6:36 UTC
13 points
2

If anyone can deploy an AGI that is less than 50% likely to kill more than a billion people, then they’ve probably… well, they’ve probably found a way to keep their AGI weak enough that it isn’t very useful.

What about AGI that is basically just virtual humans?
- Steven Byrnes 7 Oct 2022 15:34 UTC
  10 points
  1
  Parent
  For what it’s worth, Eliezer in 2018 said that he’d be pretty happy with that:
  If the subject is Paul Christiano, or Carl Shulman, I for one am willing to say these humans are reasonably aligned; and I’m pretty much okay with somebody giving them the keys to the universe in expectation that the keys will later be handed back.
  (Obviously “Eliezer in 2018” ≠ “Nate today”; Nate can chime in if he disagrees with the above.)
  Incidentally, I’ve shown the above quote to a lot of people who say “yes that’s perfectly obvious”, and I’ve also shown this quote to a lot of people who say “Eliezer is being insufficiently cynical; absolute power corrupts absolutely”. For my part, I don’t have a strong opinion, but on my models, if we know how to make virtual humans, then we probably know how to make virtual humans without envy and without status drive and without teenage angst etc., which should help somewhat. More discussion here.
  What links here?
  - Steven Byrnes's comment on My AGI safety research—2022 review, ’23 plans by Steven Byrnes (11 Jan 2024 21:09 UTC; 3 points)
  - jacob_cannell 7 Oct 2022 17:01 UTC
    5 points
    2
    Parent
    Yeah largely agree (and with the linked post) .. but status drive seems likely heavily entangled with empowerment in social creatures. For example I recall even lobsters have a simple detector of social status (based on some serotonin signaling mechanism), and since they compete socially for resources, social status is a strong predictor of future optionality and thus an empowerment signal.
    
    Also agree that AGI will likely be (or appear) conscious/sentient the way we are (or appear), and that’s probably impossible to avoid without trading off generality/capability. EY seems to have just decided earlier on that since conscious AGI is problematic, it shan’t be so.
    - Rob Bensinger 7 Oct 2022 19:48 UTC
      5 points
      1
      Parent
      Corruption-by-power (and related issues) seem like problems worth thinking about here. Though they also strike me as problems that humans tend to be very vigilant about / concerned with by default, and problems that become a lot less serious if you’ve got a lot of emulated copies of different individuals, rather than just copies of a single individual.
      that’s probably impossible to avoid without trading off generality/capability
      You need to trade off some generality/capability anyway for the sake of alignment. One hope (though not the only one) might be that there’s overlap between the capabilities we want to remove for the sake of alignment, and the ones we want to remove for the sake of reducing-the-risk-that-the-AGI-is-conscious.
      E.g., if you want your AGI to build nanotech for you and do nothing else, then you might want to limit its ability to think about itself, or its operators, or the larger world, or indeed anything other than different small-scale physical structures. Limiting its generality and self-awareness in this way might also be helpful for reducing the risk that it’s conscious.
      EY seems to have just decided earlier on that since conscious AGI is problematic, it shan’t be so.
      Where has EY said that he’s confident the first AGI systems won’t be conscious?
      - M. Y. Zuo 9 Oct 2022 1:03 UTC
        1 point
        0
        Parent
        E.g., if you want your AGI to build nanotech for you and do nothing else, then you might want to limit its ability to think about itself, or its operators, or the larger world, or indeed anything other than different small-scale physical structures. Limiting its generality and self-awareness in this way might also be helpful for reducing the risk that it’s conscious.
        I don’t quite get this example.
        How could such a system build nanotech efficiently without it having those properties? Wouldn’t it need a human operator the moment it encountered unexpected phenomena?
        If so, it just seems like a really fancy hammer and not an ‘AGI’
- Jon Garcia 7 Oct 2022 6:52 UTC
  7 points
  3
  Parent
  Wouldn’t that require solving alignment in itself, though? If you can simulate virtual humans, complete with human personalities, human cognition, and human values, then you’ve already figured out how to plug human values straight into a virtual agent.
  
  If you mean that the AGI is trained on human behavior to the point where it’s figured out human values through IRL/predictive coding/etc. and is acting on them, then that’s also basically just solving alignment.
  
  However, if you’re suggesting brain uploads, I highly doubt that such technology would be available before AGI is developed.
  
  All that is to say that, while an AGI that is basically just virtual humans would probably be great, it’s not a prospect we can depend on in lieu of alignment research. Such a result could only come about through actually doing all the hard work of alignment research first.
  - jacob_cannell 7 Oct 2022 7:03 UTC
    1 point
    1
    Parent
    Wouldn’t that require solving alignment in itself, though?
    
    Yes, but only to the same extent that evolution did. Evolution approximately solved alignment on two levels: aligning the brain with the evolutionary goal of inclusive fitness^[1], and aligning individual brains (as disposable somas) with other brains (shared kin genes) via altruism (the latter is the thing we want to emulate).
    
    ↩︎
    Massively successful, population of 10B vs a few M for all other great apes. It’s fashionable to say evolution failed at alignment: this is just stupidly wrong, humans are an enormous success from the perspective of inclusive fitness.
    - Jon Garcia 7 Oct 2022 7:31 UTC
      1 point
      0
      Parent
      Do you propose using evolutionary simulations to discover other-agent-aligned agents? I doubt we have the same luxury of (simulated) time that evolution had in creating humans. It didn’t have to compete against an intelligent designer; alignment researchers do (i.e., the broader AI community).
      
      I agree that humans are highly successful (though far from optimal) at both inclusive genetic fitness and alignment with fellow sapients. However, the challenge for us now is to parse the system that resulted from this messy evolutionary process, to pull out the human value system from human neurophysiology. Either that, or figure out general alignment from first principles.
      - jacob_cannell 7 Oct 2022 15:35 UTC
        5 points
        3
        Parent
        
        Do you propose using evolutionary simulations to discover other-agent-aligned agents?
        
        Nah. The wright brothers didn’t need to run evo sims to reverse engineer flight. They just observed how birds bank to turn, how that relied on wing warping, and said—cool, we can do that too! Deep learning didn’t succeed through brute force evo sims either (even though Karl Sim’s evo sims work is pretty cool, it turns out that loose reverse engineering is just enormously faster).
        
        However, the challenge for us now is … to pull out the human value system from human neurophysiology. Either that, or figure out general alignment from first principles.
        
        Sounds about right. Fortunately we may not need to model human values at all in order to build general altruistic agents: it probably suffices that the AI optimizes for human empowerment (our ability to fulfill any long term future goals, rather than any specific values), which is a much simpler and more robust target and thus probably more long term stable.