MSRayne comments on Naive Hypotheses on AI Alignment

MSRayne 3 Jul 2022 13:29 UTC
2 points
−6
My personal opinion is that empathy is the one most likely to work. Most proposed alignment solutions feel to me like patches rather than solutions to the problem, which is AI not actually caring about the welfare of other beings intrinsically. If it did, it would figure out how to align itself. So that’s the one I’m most interested in. I think Steven Byrnes has some interest in it as well—he thinks we ought to figure out how human social instincts are coded in the brain.
- Shoshannah Tekofsky 3 Jul 2022 14:47 UTC
  4 points
  0
  Parent
  Hmmm, yes and no?
  e.g. many people that care about animal welfare differ on the decisions they would make for those animals. What if the AGI ends up a negative utilitarian and sterilizes us all to save humanity from all future suffering? The missing element would again be to have the AGI aligned with humanity, which brings us back to H4: What’s humanity’s alignment anyway?
  - MSRayne 3 Jul 2022 16:48 UTC
    1 point
    0
    Parent
    I think “humanity’s alignment” is a strange term to use. Perhaps you mean “humanity’s values” or even “humanity’s collective utility function.”
    I’ll clarify what I mean by empathy here. I think the ideal form of empathy is wanting others to get what they themselves want. Given that entities are competing for scarce resources and tend to interfere with one another’s desires, this leads to the necessity of making tradeoffs about how much you help each desire, but in principle this seems like the ideal to me.
    So negative utilitarianism is not actually reasonably empathic, since it is not concerned with the rights of the entities in question to decide about their own futures. In fact I think it’s one of the most dangerous and harmful philosophies I’ve ever seen, and an AI such as I would like to see made would reject it altogether.