I too would prefer super AI to look to my values when deciding what to implement.
But, given the existence of moral disagreement, I don’t see why that deserves to be labeled Friendly. And the whole point of CEV or similar process is to figure out what is awesome for humanity. Implementing something other than what is awesome for all of humanity is not Friendly.
If deathism really is what is awesome for all humanity, I expect a FAI to implement deathism. But there’s no particular reason to believe that deathism is what is awesome for humanity.
Tim, your comment highlights the potential conflict between CEV and FAI that I also mentioned previously. FAI is by definition not hostile to human beings, whereas CEV might permit, or even require, the extinction of all humanity. This may happen, for instance, if the process of coherent extrapolation shows that humans value certain superior beings more than they value themselves, and if the coexistence of humans and these beings is impossible.
When I pointed out this problem, both Kaj Sotala and Michael Anissimov replied that CEV can never condone hostile actions towards humanity because FAI is “defined as ‘human-benefiting, non-human harming’”. However, this reply just proves my point, namely that there is a potential internal inconsistency between CEV and FAI.
Don’t look at me to resolve that conflict. I think moral extrapolation is unlikely to output anything coherent if the reference class is sufficiently large to avoid the objections I raised above. And I can’t think of any other plausible candidate to produce Friendly instructions for an AI.
I too would prefer super AI to look to my values when deciding what to implement.
But, given the existence of moral disagreement, I don’t see why that deserves to be labeled Friendly. And the whole point of CEV or similar process is to figure out what is awesome for humanity. Implementing something other than what is awesome for all of humanity is not Friendly.
If deathism really is what is awesome for all humanity, I expect a FAI to implement deathism. But there’s no particular reason to believe that deathism is what is awesome for humanity.
Tim, your comment highlights the potential conflict between CEV and FAI that I also mentioned previously. FAI is by definition not hostile to human beings, whereas CEV might permit, or even require, the extinction of all humanity. This may happen, for instance, if the process of coherent extrapolation shows that humans value certain superior beings more than they value themselves, and if the coexistence of humans and these beings is impossible.
When I pointed out this problem, both Kaj Sotala and Michael Anissimov replied that CEV can never condone hostile actions towards humanity because FAI is “defined as ‘human-benefiting, non-human harming’”. However, this reply just proves my point, namely that there is a potential internal inconsistency between CEV and FAI.
Don’t look at me to resolve that conflict. I think moral extrapolation is unlikely to output anything coherent if the reference class is sufficiently large to avoid the objections I raised above. And I can’t think of any other plausible candidate to produce Friendly instructions for an AI.