I want to thank the author about this post, that is a very interesting read. I see many comments already and I didn’t have the time to read them thoroughly, so my apologies if what I am stating below has been discussed already.
The author’s point that I wish to challenge is the third one: “alignment is not about loving humanity; it’s about robust reasonable compliance”.
I agree with the fact that embedding (in a principled way) “love” for humanity is a bad solution, and that it cannot go in a favourable direction—as, in the best scenario, it will likely disempower us. However, I disagree about the fact that “human values” are not part of the solution either.
Suppose that alignment is solved in the form of “robust reasonable compliance”: if human values are not embedded at all, the aligned robots will be aligned to single individuals or single organizations, and the effect is that the AIs will be used to fight proxy wars (not necessarily physical wars) among such organizations. This scenario is encouraging rogue AIs on purpose as a form of deterrent among the parties, in the same way nuclear bombs are built as a form of deterrent. That behaviour is an obvious existential risk. I have written a post about this here.
If you agree with the above, there are only two possible solutions: (1) either AI is monopolized by the UN, or (2) it is decentralized but it shares some form of human values, so that it will refuse to do unsafe or unethical things (not sure if that is even possible though, as I discussed at the end of this post).
How do you envision access and control of an AI that is robustly and reasonably compliant? And in which way would “human values” be involved? I agree with you that they are part of the solution, but I want to compare my beliefs with yours.
I want to thank the author about this post, that is a very interesting read. I see many comments already and I didn’t have the time to read them thoroughly, so my apologies if what I am stating below has been discussed already.
The author’s point that I wish to challenge is the third one: “alignment is not about loving humanity; it’s about robust reasonable compliance”.
I agree with the fact that embedding (in a principled way) “love” for humanity is a bad solution, and that it cannot go in a favourable direction—as, in the best scenario, it will likely disempower us. However, I disagree about the fact that “human values” are not part of the solution either.
Suppose that alignment is solved in the form of “robust reasonable compliance”: if human values are not embedded at all, the aligned robots will be aligned to single individuals or single organizations, and the effect is that the AIs will be used to fight proxy wars (not necessarily physical wars) among such organizations. This scenario is encouraging rogue AIs on purpose as a form of deterrent among the parties, in the same way nuclear bombs are built as a form of deterrent. That behaviour is an obvious existential risk. I have written a post about this here.
If you agree with the above, there are only two possible solutions: (1) either AI is monopolized by the UN, or (2) it is decentralized but it shares some form of human values, so that it will refuse to do unsafe or unethical things (not sure if that is even possible though, as I discussed at the end of this post).
To be clear, I think that embedding human values is part of the solution - see my comment
How do you envision access and control of an AI that is robustly and reasonably compliant? And in which way would “human values” be involved? I agree with you that they are part of the solution, but I want to compare my beliefs with yours.