Well, it’s intentionally a riff on that one. I wanted one that illustrated that these “shriek” situations, where some value system takes over and gets locked in forever, don’t necessarily involve “defectors”. I felt that the last scenario was missing something by concentrating entirely on the “sneaky defector takes over” aspect, and I didn’t see any that brought out the “shared human values aren’t necssarily all that” aspect.
Well, it’s intentionally a riff on that one. I wanted one that illustrated that these “shriek” situations, where some value system takes over and gets locked in forever, don’t necessarily involve “defectors”. I felt that the last scenario was missing something by concentrating entirely on the “sneaky defector takes over” aspect, and I didn’t see any that brought out the “shared human values aren’t necssarily all that” aspect.
Ah, good point! I have a feeling this is a central issue that is hardly discussed here (or anywhere)
Will MacAskill calls this the “actual alignment problem”
Wei Dai has written a lot about related concerns in posts like The Argument from Philosophical Difficulty