Rohin Shah comments on Two Neglected Problems in Human-AI Safety

Rohin Shah 17 Dec 2018 10:05 UTC
LW: 13 AF: 4
AF
It seems to me that the second problem falls under the more general category of “Competing superintelligent AI systems could do bad things, even if they are aligned”. Is there a reason the focus on corruption of values is particularly salient to you? Or would you categorize this as about as important as the problem of dealing with superintelligent AI systems getting into an arms race? Maybe you think that corruption of values leads to much more value loss than anything else? (I don’t see why that would be true.)
Are you hoping that we come up with different solutions that make defense easier than offense in all of these possible threats? It seems more important to me to work on trying not to get into this situation in the first place. (I also make this claim on the current margin.) However, this does seem particularly difficult to achieve, so I’d love for someone to think through this and realize that we actually do have a nice technical solution that allows us to not have to make different groups of humans cooperate with each other.
What links here?
- Long Reflection Reading List by Will Aldred (EA Forum; 24 Mar 2024 16:27 UTC; 92 points)
- Alignment Newsletter #37 by Rohin Shah (17 Dec 2018 19:10 UTC; 25 points)
- Wei Dai 17 Dec 2018 10:36 UTC
  LW: 7 AF: 3
  AF Parent
  Good question. :)
  
  general category of “Competing superintelligent AI systems could do bad things, even if they are aligned”
  
  This general category could potentially be solved by AIs being very good at cooperating with other AIs. For example maybe AIs can merge together in a secure/verifiable way. (How to ensure this seems to be another overly neglected topic.) However the terms of any merger will likely reflect the pre-merger balance of power, which in this particular competitive arena seems to (by default) disfavor people who have a proper amount of value complexity and moral uncertainty (as I suggested in the OP).
  What links here?