I think this is an interesting perspective, and I encourage more investigation.
Briefly responding, I have one caveat: curse of dimensionality. If values are a high dimensional space (they are: they’re functions) then ‘off by a bit’ could easily mean ‘essentially zero measure overlap’. This is not the case in the illustration (which is 1-D).
I agree with your point about the difficulty of overlapping distributions in high dimensional space. It’s not like the continuous perspective suddenly makes value alignment trivial. However, to me it seems like “overlapping two continuous distributions in a space X” is ~ always easier than “overlapping two sets of discrete points in space X”.
Of course, it depends on your error tolerance for what counts as “overlap” of the points. However, my impression from the way that people talk about value fragility is that they expect there to be a very low degree of error tolerance between human versus AI values.
I think this is an interesting perspective, and I encourage more investigation.
Briefly responding, I have one caveat: curse of dimensionality. If values are a high dimensional space (they are: they’re functions) then ‘off by a bit’ could easily mean ‘essentially zero measure overlap’. This is not the case in the illustration (which is 1-D).
I agree with your point about the difficulty of overlapping distributions in high dimensional space. It’s not like the continuous perspective suddenly makes value alignment trivial. However, to me it seems like “overlapping two continuous distributions in a space X” is ~ always easier than “overlapping two sets of discrete points in space X”.
Of course, it depends on your error tolerance for what counts as “overlap” of the points. However, my impression from the way that people talk about value fragility is that they expect there to be a very low degree of error tolerance between human versus AI values.