It sounds like the crux is whether having time with powerful (compared to today) but sub-AGI systems will make the time we have for alignment better spent. Does that sound right?
I’m thinking it will because i) you can better demonstrate AI alignment problems empirically to convince top AI researchers to prioritise safety work, ii) you can try out different alignment proposals and do other empirical work with powerful AIs, iii) you can try to leverage powerful AIs to help you do alignment research itself.
Whereas you think these things are so unlikely to help that getting more time with powerful AIs is strategically irrelevant
Yeah, that’s right. Of your three channels for impact:
i) you can better demonstrate AI alignment problems empirically to convince top AI researchers to prioritise safety work, ii) you can try out different alignment proposals and do other empirical work with powerful AIs, iii) you can try to leverage powerful AIs to help you do alignment research itself
… (i) and (ii) both work ~only to the extent that the important problems are visible. Demonstrating alignment problems empirically ~only matters if they’re visible and obvious. Trying out different alignment proposals also ~only matters if their failure modes are actually detectable.
(iii) fails for a different reason, namely that by the time AIs are able to significantly accelerate the hard parts of alignment work, they’ll already have foomed. Reasoning: there’s generally a transition point between “AI is worse than human at task, so task is mostly done by human” and “AI is comparable to human or better, so task is mostly done by AI”. Foom occurs roughly when AI crosses that transition point for AI research itself. And alignment is technically similar enough to AI research more broadly that I expect the transition to be roughly-simultaneous for capabilities and alignment research.
If AI automates 50% of both alignment work and capabilities research, it could help with alignment before foom (while also bringing foom forward in time)
A leading project might choose to use AIs for alignment rather for fooming
AI might be more useful for alignment work than for capabilities work
fooming may require may compute than certain types of alignment work
It sounds like the crux is whether having time with powerful (compared to today) but sub-AGI systems will make the time we have for alignment better spent. Does that sound right?
I’m thinking it will because i) you can better demonstrate AI alignment problems empirically to convince top AI researchers to prioritise safety work, ii) you can try out different alignment proposals and do other empirical work with powerful AIs, iii) you can try to leverage powerful AIs to help you do alignment research itself.
Whereas you think these things are so unlikely to help that getting more time with powerful AIs is strategically irrelevant
Yeah, that’s right. Of your three channels for impact:
… (i) and (ii) both work ~only to the extent that the important problems are visible. Demonstrating alignment problems empirically ~only matters if they’re visible and obvious. Trying out different alignment proposals also ~only matters if their failure modes are actually detectable.
(iii) fails for a different reason, namely that by the time AIs are able to significantly accelerate the hard parts of alignment work, they’ll already have foomed. Reasoning: there’s generally a transition point between “AI is worse than human at task, so task is mostly done by human” and “AI is comparable to human or better, so task is mostly done by AI”. Foom occurs roughly when AI crosses that transition point for AI research itself. And alignment is technically similar enough to AI research more broadly that I expect the transition to be roughly-simultaneous for capabilities and alignment research.
Quick responses to your argument for (iii).
If AI automates 50% of both alignment work and capabilities research, it could help with alignment before foom (while also bringing foom forward in time)
A leading project might choose to use AIs for alignment rather for fooming
AI might be more useful for alignment work than for capabilities work
fooming may require may compute than certain types of alignment work