Chris_Leong comments on General alignment plus human values, or alignment via human values?

Chris_Leong 22 Feb 2022 5:19 UTC
LW: 2 AF: 1
AF
If we have an algorithm that aligns an AI with X values, then we can add human values to get an AI that is aligned with human values.
On the other hand, I agree that it doesn’t really make sense to declare an AI safe in the abstract, rather than in respect to say human values. (Small counterpoint: in order to be safe, it’s not just about alignment, you also need to avoid bugs. This can be defined without reference to human values. However, this isn’t sufficient for safety).
I suppose this works as a criticism of approaches like quantisers or impact-minimisation which attempt abstract safety. Although I can’t see any reason why it’d imply that it’s impossible to write an AI that can be aligned with arbitrary values.