When AI alignment researchers talk about ‘alignment’, they often seem to have a mental model where either (1) there’s a single relevant human user whose latent preferences the AI system should become aligned with (e.g. a self-driving car with a single passenger); or (2) there’s all 7.8 billion humans that the AI system should be aligned with, so it doesn’t impose global catastrophic risks.
[...]
So, I’m left wondering what AI safety researchers are really talking about when they talk about ‘alignment’.
The simple answer here is that many technical AI safety researchers on this forum talk exclusively about (1) and (2) so that they can avoid confronting all of the difficult socio-political issues you mention. Many of them avoid it specifically because they believe they would not be very good at politics anyway.
This is of course a shame, because the cases between (1) and (2) have a level of complexity that also needs to be investigated. I am a technical AI safety researcher who is increasingly moving into the space between (1) and (2), in part also because I consider (1) and (2) to be more solved than many other AI safety researchers on this forum like to believe.
This then has me talking about alignment with locally applicable social contracts, and about the technology of how such social contracts can be encoded into an AI. See for example the intro post and paper here.
Koen—thanks for your comment. I agree that too many AI safety researchers seem to be ignored all these socio-political issues relevant to alignment. My worry is that, given that many human values are tightly bound to political, religious, tribal, and cultural beliefs (or at least people think they are), ignoring those values means we won’t actually achieve ‘alignment’ even when we think we have. The results could be much more disastrous than knowing we haven’t achieved alignment.
You are welcome. Another answer to your question just occurred to me.
If you count AI fairness research
as a sub-type of AI alignment research, then you can find a whole community of alignment researchers who talk quite a lot with each other about ‘aligned with whom’ in quite sophisticated ways. Reference: the main conference of this community is ACM FAccT.
In EA and on this forum, when people count the number of alignment researchers, they usually count dedicated x-risk alignment researchers only, and not the people working on fairness, or on the problem of making self-driving cars safer. There is a somewhat unexamined assumption in the AI x-risk community that fairness and self-driving car safety techniques are not very relevant to managing AI x-risk, both in the technical space and the policy space. The way my x-risk technical work is going, it is increasingly telling me that this unexamined assumption is entirely wrong.
On a lighter note:
ignoring those values means we won’t actually achieve ‘alignment’ even when we think we have.
Well, as long as the ‘we’ you are talking about here is a group of people that still includes Eliezer Yudkowsky, then I can guarantee that ‘we’ are in no danger of ever collectively believing that we have achieved alignment.
Koen—thanks for the link to ACM FAccT; looks interesting. I’ll see what their people have to say about the ‘aligned with whom’ question.
I agree that AI X-risk folks should probably pay more attention to the algorithmic fairness folks and self-driving car folks, in terms of seeing what general lessons can be learned about alignment from these specific domains.
The simple answer here is that many technical AI safety researchers on this forum talk exclusively about (1) and (2) so that they can avoid confronting all of the difficult socio-political issues you mention. Many of them avoid it specifically because they believe they would not be very good at politics anyway.
This is of course a shame, because the cases between (1) and (2) have a level of complexity that also needs to be investigated. I am a technical AI safety researcher who is increasingly moving into the space between (1) and (2), in part also because I consider (1) and (2) to be more solved than many other AI safety researchers on this forum like to believe.
This then has me talking about alignment with locally applicable social contracts, and about the technology of how such social contracts can be encoded into an AI. See for example the intro post and paper here.
Koen—thanks for your comment. I agree that too many AI safety researchers seem to be ignored all these socio-political issues relevant to alignment. My worry is that, given that many human values are tightly bound to political, religious, tribal, and cultural beliefs (or at least people think they are), ignoring those values means we won’t actually achieve ‘alignment’ even when we think we have. The results could be much more disastrous than knowing we haven’t achieved alignment.
You are welcome. Another answer to your question just occurred to me.
If you count AI fairness research as a sub-type of AI alignment research, then you can find a whole community of alignment researchers who talk quite a lot with each other about ‘aligned with whom’ in quite sophisticated ways. Reference: the main conference of this community is ACM FAccT.
In EA and on this forum, when people count the number of alignment researchers, they usually count dedicated x-risk alignment researchers only, and not the people working on fairness, or on the problem of making self-driving cars safer. There is a somewhat unexamined assumption in the AI x-risk community that fairness and self-driving car safety techniques are not very relevant to managing AI x-risk, both in the technical space and the policy space. The way my x-risk technical work is going, it is increasingly telling me that this unexamined assumption is entirely wrong.
On a lighter note:
Well, as long as the ‘we’ you are talking about here is a group of people that still includes Eliezer Yudkowsky, then I can guarantee that ‘we’ are in no danger of ever collectively believing that we have achieved alignment.
Koen—thanks for the link to ACM FAccT; looks interesting. I’ll see what their people have to say about the ‘aligned with whom’ question.
I agree that AI X-risk folks should probably pay more attention to the algorithmic fairness folks and self-driving car folks, in terms of seeing what general lessons can be learned about alignment from these specific domains.