If you want to solve AI safety before AI capabilities become too great, then it seems that AI safety must have some of the following:
More researchers
Better researchers
Less necessary insights
Easier necessary insights
Ability to steal insights from AI capability research more than the reverse.
...
Is this likely to be the case? Why? Another way to ask this question is: Under which scenarios doesn’t aligning add time?
Some more ways:
If it turns out that capabilities and safety are not so dichotomous, and so robustness / interpretability / safe exploration / maybe even impact regularisation get solved by the capabilities lot.
If early success with a date-competitive performance-competitive safety programme (e.g. IDA) puts capabilities research onto a safe path.
Let’s just save time by jumping to to the place where the AI in charge of AI Safety goes Foom and takes us back to the Stone Age “for safety” ;)