This site has been at pains to emphasise that an AI will be an optimization process of never-before-seen power, rewriting reality in ways that we couldn’t possibly predict, and as such an AI whose values are even slightly misaligned with one’s own would be catastrophic for one’s actual values.
What is relevant to the decision to create or prevent such an AI from operating is the comparison between what will occur in the absence of the AI and what the AI will do. For example gwern’s values are not identical to mine but if I had the choice between pressing a button to release an FAI or a button to destroy it then I would press the button to release it. FAI isn’t as good as FAI (by subjective tautology) but FAI is overwhelmingly better than nothing. I expect FAI to allow me to live for millions of years, and for the cosmic commons to be exploited to do things that I generally approve of. Without that AI I think it is most likely that myself and my species will go to oblivion.
The above doesn’t even take into account cooperation mechanisms. That’s just flat acceptance of optimisation for another’s values over distinctly sub-optimisation of my own. When it comes to agents with conflicting values cooperating negotiation applies and if both agents are rational and in a situation where mutual FAI creation is possible but unilateral FAI creation can be prevented then the result will be an FAI that optimises for a compromise of the value systems. To whatever extent the values of the two agents are not perfectly opposed this outcome will be superior to the non-cooperative outcome. For example if gwern and I were in such a situation the expected result would be the release of FAI>. Neither of us will prefer that option over the FAI that is personalised to ourselves but there is still a powerful incentive to cooperate. That outcome is better than what we would have without cooperation. The same applies if a paperclip maximiser and a staple maximiser are put in that situation. (It does not apply is a paperclip maximiser meets a paperclip minimiser.)
What is relevant to the decision to create or prevent such an AI from operating is the comparison between what will occur in the absence of the AI and what the AI will do. For example gwern’s values are not identical to mine but if I had the choice between pressing a button to release an FAI or a button to destroy it then I would press the button to release it. FAI isn’t as good as FAI (by subjective tautology) but FAI is overwhelmingly better than nothing. I expect FAI to allow me to live for millions of years, and for the cosmic commons to be exploited to do things that I generally approve of. Without that AI I think it is most likely that myself and my species will go to oblivion.
The above doesn’t even take into account cooperation mechanisms. That’s just flat acceptance of optimisation for another’s values over distinctly sub-optimisation of my own. When it comes to agents with conflicting values cooperating negotiation applies and if both agents are rational and in a situation where mutual FAI creation is possible but unilateral FAI creation can be prevented then the result will be an FAI that optimises for a compromise of the value systems. To whatever extent the values of the two agents are not perfectly opposed this outcome will be superior to the non-cooperative outcome. For example if gwern and I were in such a situation the expected result would be the release of FAI>. Neither of us will prefer that option over the FAI that is personalised to ourselves but there is still a powerful incentive to cooperate. That outcome is better than what we would have without cooperation. The same applies if a paperclip maximiser and a staple maximiser are put in that situation. (It does not apply is a paperclip maximiser meets a paperclip minimiser.)