The problem of whether the goals and values of an artificially intelligent agent will align with human goals and values, can be reduced to this problem: Will the goals and values of different human agents ever align with each other?
I would be ecstatic if AI turned out to be perfectly aligned with any particular human, rather than aligned with no human at all. The problem of “oh no what if Deepmind disagrees with the chinese about which human values to put in the AI” is rather small compared with the problem of actually figuring out how to put any values at all in the AI.
If the context was instead the problem of sending a rocket to the moon, you would have assumed away the actual engineering of the rocket and would now be concerned with the human squabbling about the particular destination crater.
Yes, if all humans agreed on everything, there would still be significant technical problems to get an AI to align with all the humans. Most of the existing arguments for the difficulty of AI alignment would still hold even if all humans agreed. If you (Henry) think these existing arguments are wrong, could you say something about why you think that, i.e. offer counterarguments?
I would be ecstatic if AI turned out to be perfectly aligned with any particular human, rather than aligned with no human at all. The problem of “oh no what if Deepmind disagrees with the chinese about which human values to put in the AI” is rather small compared with the problem of actually figuring out how to put any values at all in the AI.
If the context was instead the problem of sending a rocket to the moon, you would have assumed away the actual engineering of the rocket and would now be concerned with the human squabbling about the particular destination crater.
Yes, if all humans agreed on everything, there would still be significant technical problems to get an AI to align with all the humans. Most of the existing arguments for the difficulty of AI alignment would still hold even if all humans agreed. If you (Henry) think these existing arguments are wrong, could you say something about why you think that, i.e. offer counterarguments?