2. Getting alignment right accounts for most of the variance in whether an AGI system will be positive for humanity.
Miri from their fundraiser seems to think it is important that the first people that develop AI use it to develop other technologies to get to a safe period. This suggests to me that they care what is done with the first AGIs.
and if early AGI systems can be used safely at all, then we expect it to be possible for an AI-empowered project to safely automate a reasonably small set of concrete science and engineering tasks that are sufficient for ending the risk period.
Most alignment research seems to be about aligning to one person. Things like corrigibility seem to be like that, rather than the whole of humanity.
I also think you can get this type of “alignment” right, in that it aligns to one person or group and still have a bad outcome. It depends a lot on the sanity of that group. A suicide cult that wants to destroy the earth would not be a good group for the AI to be aligned to, for the rest of us.
Miri from their fundraiser seems to think it is important that the first people that develop AI use it to develop other technologies to get to a safe period. This suggests to me that they care what is done with the first AGIs.
From their fundraiser:
Most alignment research seems to be about aligning to one person. Things like corrigibility seem to be like that, rather than the whole of humanity.
I also think you can get this type of “alignment” right, in that it aligns to one person or group and still have a bad outcome. It depends a lot on the sanity of that group. A suicide cult that wants to destroy the earth would not be a good group for the AI to be aligned to, for the rest of us.
Actually, I might just be wrong about this. There are very important questions regarding what goals to set up in the early AGIs.
I still think that there’s something important that’s close to this though, and I may write a separate post trying to say this more accurately.