So personal intent alignment is basically all we get except in perhaps very small groups.
I want to disagree here. I think that a widely acceptable compromise on political rules, and the freedom to pursue happiness on one’s own terms without violating others’ rights, is quite achievable and desirable. I think that having a powerful AI establish/maintain the best possible government given the conflicting sets of values held by all parties is a great outcome. I agree that this isn’t what is generally meant by ‘values alignment’, but I think it’s a more useful thing to talk about.
I do agree that large groups of humans do seem to inevitably have contradictory values such that no perfect resolution is possible. I just think that that is beside the point, and not what we should even be fantasizing about. I also agree that most people who seem excited about ‘values alignment’ mean ‘alignment to their own values’. I’ve had numerous conversations with such people about the problem of people with harmful intent towards others (e.g. sadism, vengeance). I have yet to receive anything even remotely resembling a coherent response to this. Averaging values doesn’t solve the problem, there are weird bad edge cases that that falls into. Instead, you need to focus on a widely (but not necessarily unanimously) acceptable political compromise.
I want to disagree here. I think that a widely acceptable compromise on political rules, and the freedom to pursue happiness on one’s own terms without violating others’ rights, is quite achievable and desirable. I think that having a powerful AI establish/maintain the best possible government given the conflicting sets of values held by all parties is a great outcome. I agree that this isn’t what is generally meant by ‘values alignment’, but I think it’s a more useful thing to talk about.
I do agree that large groups of humans do seem to inevitably have contradictory values such that no perfect resolution is possible. I just think that that is beside the point, and not what we should even be fantasizing about. I also agree that most people who seem excited about ‘values alignment’ mean ‘alignment to their own values’. I’ve had numerous conversations with such people about the problem of people with harmful intent towards others (e.g. sadism, vengeance). I have yet to receive anything even remotely resembling a coherent response to this. Averaging values doesn’t solve the problem, there are weird bad edge cases that that falls into. Instead, you need to focus on a widely (but not necessarily unanimously) acceptable political compromise.