I guess the main doubt I have with this strategy is that even if we shift the vast majority of people/companies towards more interpretable AI, there will still be some actors who pursue black-box AI. Wouldn’t we just get screwed by those actors? I don’t see how CoEm can be of equivalent power to purely black-box automation.
That said, there may be ways to integrate CoEm’s into the Super Alignment strategy.
In section 5, I explain how CoEm is an agenda with relaxed constraints. It does try to reduce the alignment tax to make the safety solution competitive for lab to use. Instead it considers there’s enough advance in international governance that you have full control over how your AI get built and that there’s enforcement mechanism to ensure no competitive but unsafe AI can be built somewhere else.
That’s what the bifurcation of narrative is about: not letting lab implement only solution that have low alignment tax because this could just not be enough.
I guess the main doubt I have with this strategy is that even if we shift the vast majority of people/companies towards more interpretable AI, there will still be some actors who pursue black-box AI. Wouldn’t we just get screwed by those actors? I don’t see how CoEm can be of equivalent power to purely black-box automation.
That said, there may be ways to integrate CoEm’s into the Super Alignment strategy.
In section 5, I explain how CoEm is an agenda with relaxed constraints. It does try to reduce the alignment tax to make the safety solution competitive for lab to use. Instead it considers there’s enough advance in international governance that you have full control over how your AI get built and that there’s enforcement mechanism to ensure no competitive but unsafe AI can be built somewhere else.
That’s what the bifurcation of narrative is about: not letting lab implement only solution that have low alignment tax because this could just not be enough.