The best argument here probably comes from Paul Christiano, but to summarize the argument, it’s because even in a situation where we messed up pretty badly in aligning the AI, so long as the failure mode isn’t deceptive alignment but instead misgeneralization of human preferences/non-deceptive alignment failures, it’s pretty likely that there will be at least some human-regarding preferences, and that means the AI will do some acts of niceness if it is cheap to them, and preserving humans is very cheap for superintelligent AI.
The best argument here probably comes from Paul Christiano, but to summarize the argument, it’s because even in a situation where we messed up pretty badly in aligning the AI, so long as the failure mode isn’t deceptive alignment but instead misgeneralization of human preferences/non-deceptive alignment failures, it’s pretty likely that there will be at least some human-regarding preferences, and that means the AI will do some acts of niceness if it is cheap to them, and preserving humans is very cheap for superintelligent AI.
More answers can be found here:
https://www.lesswrong.com/posts/xvBZPEccSfM8Fsobt/what-are-the-best-arguments-for-against-ais-being-slightly#qsmA3GBJMrkFQM5Rn
https://www.lesswrong.com/posts/87EzRDAHkQJptLthE/but-why-would-the-ai-kill-us?commentId=sEzzJ8bjCQ7aKLSJo
https://www.lesswrong.com/posts/2NncxDQ3KBDCxiJiP/cosmopolitan-values-don-t-come-free?commentId=ofPTrG6wsq7CxuTXk
https://www.lesswrong.com/posts/87EzRDAHkQJptLthE/but-why-would-the-ai-kill-us?commentId=xK2iHGJfHvmyCCZsh