If you follow through on this view it seems to lead to the position that everyone has their own referent for “good”, and there is no meaningful way for two different humans to argue about whether a given action is good. Which would suggest there is little point trying to persuade other people to be good, or hoping to collaboratively construct a friendly AI (since an l-friendly AI is unlikely to be e-friendly).
If you follow through on this view it seems to lead to the position that everyone has their own referent for “good”, and there is no meaningful way for two different humans to argue about whether a given action is good. Which would suggest there is little point trying to persuade other people to be good, or hoping to collaboratively construct a friendly AI (since an l-friendly AI is unlikely to be e-friendly).
Cooperation does not require modification of others to have identical values. Even agents with actively opposed values can cooperate (and so create a mutually friendly AI) so long as the opposition is not perfect in all regards.
This site has been at pains to emphasise that an AI will be an optimization process of never-before-seen power, rewriting reality in ways that we couldn’t possibly predict, and as such an AI whose values are even slightly misaligned with one’s own would be catastrophic for one’s actual values.
This site has been at pains to emphasise that an AI will be an optimization process of never-before-seen power, rewriting reality in ways that we couldn’t possibly predict, and as such an AI whose values are even slightly misaligned with one’s own would be catastrophic for one’s actual values.
What is relevant to the decision to create or prevent such an AI from operating is the comparison between what will occur in the absence of the AI and what the AI will do. For example gwern’s values are not identical to mine but if I had the choice between pressing a button to release an FAI or a button to destroy it then I would press the button to release it. FAI isn’t as good as FAI (by subjective tautology) but FAI is overwhelmingly better than nothing. I expect FAI to allow me to live for millions of years, and for the cosmic commons to be exploited to do things that I generally approve of. Without that AI I think it is most likely that myself and my species will go to oblivion.
The above doesn’t even take into account cooperation mechanisms. That’s just flat acceptance of optimisation for another’s values over distinctly sub-optimisation of my own. When it comes to agents with conflicting values cooperating negotiation applies and if both agents are rational and in a situation where mutual FAI creation is possible but unilateral FAI creation can be prevented then the result will be an FAI that optimises for a compromise of the value systems. To whatever extent the values of the two agents are not perfectly opposed this outcome will be superior to the non-cooperative outcome. For example if gwern and I were in such a situation the expected result would be the release of FAI>. Neither of us will prefer that option over the FAI that is personalised to ourselves but there is still a powerful incentive to cooperate. That outcome is better than what we would have without cooperation. The same applies if a paperclip maximiser and a staple maximiser are put in that situation. (It does not apply is a paperclip maximiser meets a paperclip minimiser.)
If you follow through on this view it seems to lead to the position that everyone has their own referent for “good”, and there is no meaningful way for two different humans to argue about whether a given action is good. Which would suggest there is little point trying to persuade other people to be good, or hoping to collaboratively construct a friendly AI (since an l-friendly AI is unlikely to be e-friendly).
Cooperation does not require modification of others to have identical values. Even agents with actively opposed values can cooperate (and so create a mutually friendly AI) so long as the opposition is not perfect in all regards.
This site has been at pains to emphasise that an AI will be an optimization process of never-before-seen power, rewriting reality in ways that we couldn’t possibly predict, and as such an AI whose values are even slightly misaligned with one’s own would be catastrophic for one’s actual values.
What is relevant to the decision to create or prevent such an AI from operating is the comparison between what will occur in the absence of the AI and what the AI will do. For example gwern’s values are not identical to mine but if I had the choice between pressing a button to release an FAI or a button to destroy it then I would press the button to release it. FAI isn’t as good as FAI (by subjective tautology) but FAI is overwhelmingly better than nothing. I expect FAI to allow me to live for millions of years, and for the cosmic commons to be exploited to do things that I generally approve of. Without that AI I think it is most likely that myself and my species will go to oblivion.
The above doesn’t even take into account cooperation mechanisms. That’s just flat acceptance of optimisation for another’s values over distinctly sub-optimisation of my own. When it comes to agents with conflicting values cooperating negotiation applies and if both agents are rational and in a situation where mutual FAI creation is possible but unilateral FAI creation can be prevented then the result will be an FAI that optimises for a compromise of the value systems. To whatever extent the values of the two agents are not perfectly opposed this outcome will be superior to the non-cooperative outcome. For example if gwern and I were in such a situation the expected result would be the release of FAI>. Neither of us will prefer that option over the FAI that is personalised to ourselves but there is still a powerful incentive to cooperate. That outcome is better than what we would have without cooperation. The same applies if a paperclip maximiser and a staple maximiser are put in that situation. (It does not apply is a paperclip maximiser meets a paperclip minimiser.)