Steven Byrnes comments on What does it mean for an AGI to be ‘safe’?

Steven Byrnes 7 Oct 2022 15:34 UTC
10 points
1
For what it’s worth, Eliezer in 2018 said that he’d be pretty happy with that:
If the subject is Paul Christiano, or Carl Shulman, I for one am willing to say these humans are reasonably aligned; and I’m pretty much okay with somebody giving them the keys to the universe in expectation that the keys will later be handed back.
(Obviously “Eliezer in 2018” ≠ “Nate today”; Nate can chime in if he disagrees with the above.)
Incidentally, I’ve shown the above quote to a lot of people who say “yes that’s perfectly obvious”, and I’ve also shown this quote to a lot of people who say “Eliezer is being insufficiently cynical; absolute power corrupts absolutely”. For my part, I don’t have a strong opinion, but on my models, if we know how to make virtual humans, then we probably know how to make virtual humans without envy and without status drive and without teenage angst etc., which should help somewhat. More discussion here.
What links here?
- Steven Byrnes's comment on My AGI safety research—2022 review, ’23 plans by Steven Byrnes (11 Jan 2024 21:09 UTC; 3 points)
- jacob_cannell 7 Oct 2022 17:01 UTC
  5 points
  2
  Parent
  Yeah largely agree (and with the linked post) .. but status drive seems likely heavily entangled with empowerment in social creatures. For example I recall even lobsters have a simple detector of social status (based on some serotonin signaling mechanism), and since they compete socially for resources, social status is a strong predictor of future optionality and thus an empowerment signal.
  
  Also agree that AGI will likely be (or appear) conscious/sentient the way we are (or appear), and that’s probably impossible to avoid without trading off generality/capability. EY seems to have just decided earlier on that since conscious AGI is problematic, it shan’t be so.
  - Rob Bensinger 7 Oct 2022 19:48 UTC
    5 points
    1
    Parent
    Corruption-by-power (and related issues) seem like problems worth thinking about here. Though they also strike me as problems that humans tend to be very vigilant about / concerned with by default, and problems that become a lot less serious if you’ve got a lot of emulated copies of different individuals, rather than just copies of a single individual.
    that’s probably impossible to avoid without trading off generality/capability
    You need to trade off some generality/capability anyway for the sake of alignment. One hope (though not the only one) might be that there’s overlap between the capabilities we want to remove for the sake of alignment, and the ones we want to remove for the sake of reducing-the-risk-that-the-AGI-is-conscious.
    E.g., if you want your AGI to build nanotech for you and do nothing else, then you might want to limit its ability to think about itself, or its operators, or the larger world, or indeed anything other than different small-scale physical structures. Limiting its generality and self-awareness in this way might also be helpful for reducing the risk that it’s conscious.
    EY seems to have just decided earlier on that since conscious AGI is problematic, it shan’t be so.
    Where has EY said that he’s confident the first AGI systems won’t be conscious?
    - M. Y. Zuo 9 Oct 2022 1:03 UTC
      1 point
      0
      Parent
      E.g., if you want your AGI to build nanotech for you and do nothing else, then you might want to limit its ability to think about itself, or its operators, or the larger world, or indeed anything other than different small-scale physical structures. Limiting its generality and self-awareness in this way might also be helpful for reducing the risk that it’s conscious.
      I don’t quite get this example.
      How could such a system build nanotech efficiently without it having those properties? Wouldn’t it need a human operator the moment it encountered unexpected phenomena?
      If so, it just seems like a really fancy hammer and not an ‘AGI’