The argument seems to be, if Preference1 is too different from Preference1, then Preference1 is a bad method of preference-extraction and should be rethought. A good method Preference2 for preference-extraction should have Preference2 much closer to Preference2. And since Preference1 is inadequate, as demonstrated by this test case, Preference1 is also probably hugely worse for cousin_it than Preference2, even if Preference2 is better than Preference2.
We are not that wise in the sense that any moral progress we’ve achieved, if it’s indeed progress (so that on reflection, both past and future would agree that the direction was right) and not arbitrary change, shouldn’t be a problem for an AI to repeat, and thus this progress in particular (as opposed to other possible differences) shouldn’t contribute to differences in extracted preference.
The argument seems to be, if Preference1 is too different from Preference1, then Preference1 is a bad method of preference-extraction and should be rethought. A good method Preference2 for preference-extraction should have Preference2 much closer to Preference2. And since Preference1 is inadequate, as demonstrated by this test case, Preference1 is also probably hugely worse for cousin_it than Preference2, even if Preference2 is better than Preference2.
Of course the above constraint isn’t nearly enough to uniquely specify Preference2.
The argument seems to be, if Preference1 is too different from Preference1, then Preference1 is a bad method of preference-extraction and should be rethought. A good method Preference2 for preference-extraction should have Preference2 much closer to Preference2. And since Preference1 is inadequate, as demonstrated by this test case, Preference1 is also probably hugely worse for cousin_it than Preference2, even if Preference2 is better than Preference2.
We are not that wise in the sense that any moral progress we’ve achieved, if it’s indeed progress (so that on reflection, both past and future would agree that the direction was right) and not arbitrary change, shouldn’t be a problem for an AI to repeat, and thus this progress in particular (as opposed to other possible differences) shouldn’t contribute to differences in extracted preference.
Of course the above constraint isn’t nearly enough to uniquely specify Preference2.