Elaborating on my comment (on the world where training time is the bottleneck, and engineers help):
To the extent major progress and flashy results are dependent on massive engineering efforts, that this seems like this lowers the portability of advances and makes it more difficult for teams to form coalitions. [Compare to a world where you just have to glue together different conceptual advances, and so you plug one model into another and are basically done.] This also means we should think about how progress happens in other fields with lots of free parameters that are sort of optimized jointly—semiconductor manufacturing is the primary thing that comes to mind, where you have about a dozen different fields of engineering that are all constrained by each other and the joint tradeoffs are sort of nightmarish to behold or manage. [Subfield A would be much better off if we switched from silicon to germanium, but everyone else would scream—but perhaps we’ll need to switch eventually anyway.] The more bloated all of these projects become, the harder it is to do fundamental reimaginings of how these things work (a favorite example of mine here is replacing matmuls in neural networks with bitshifts, also known as “you only wanted the ability to multiply by powers of 2, right?”, which seems like it is ludicrously more efficient and is still pretty trainable, but requires thinking about gradient updates differently, and the more effort you’ve put into optimizing how you pipe gradient updates around, the harder it is to make transitions like that).
This is also possibly quite relevant to safety; if it’s hard to ‘tack on safety’ at the end, then it’s important we start with something safe and then build a mountain of small improvements for it, rather than building the mountain of improvements for something that turns out to be not safe and then starting over.
Elaborating on my comment (on the world where training time is the bottleneck, and engineers help):
To the extent major progress and flashy results are dependent on massive engineering efforts, that this seems like this lowers the portability of advances and makes it more difficult for teams to form coalitions. [Compare to a world where you just have to glue together different conceptual advances, and so you plug one model into another and are basically done.] This also means we should think about how progress happens in other fields with lots of free parameters that are sort of optimized jointly—semiconductor manufacturing is the primary thing that comes to mind, where you have about a dozen different fields of engineering that are all constrained by each other and the joint tradeoffs are sort of nightmarish to behold or manage. [Subfield A would be much better off if we switched from silicon to germanium, but everyone else would scream—but perhaps we’ll need to switch eventually anyway.] The more bloated all of these projects become, the harder it is to do fundamental reimaginings of how these things work (a favorite example of mine here is replacing matmuls in neural networks with bitshifts, also known as “you only wanted the ability to multiply by powers of 2, right?”, which seems like it is ludicrously more efficient and is still pretty trainable, but requires thinking about gradient updates differently, and the more effort you’ve put into optimizing how you pipe gradient updates around, the harder it is to make transitions like that).
This is also possibly quite relevant to safety; if it’s hard to ‘tack on safety’ at the end, then it’s important we start with something safe and then build a mountain of small improvements for it, rather than building the mountain of improvements for something that turns out to be not safe and then starting over.