Overall this is great, and I have just one concern about a possible what-if case that isn’t covered.
Model development secrets being disproportionately important.
I know, it’s a really tricky thing to deal with if this becomes the case. But I think it’s worth having a ‘what if’ plan in place in case this suddenly happens. Imagine a lab is working on their research and some researcher stumbles across an algorithmic innovation which makes a million-fold improvement in training efficiency. The peak capability level of the model that can now be trained for a few thousand dollars is now on par with their leading multi-billion dollar frontier model. Furthermore, this peak capability level is sufficient for the 3x speedup in AI R&D that you specified as being a threshold of high concern. Under such a circumstance, this secret becomes even more dangerous and valuable than the multi-billion-dollar-training-cost frontier model weights. Seems like having a plan in place for what if this happens would be wise.
Something like: don’t tell your co-workers, report to so-and-so specific person whose job it is to handle evaluating and reporting up-the-chain about possible dangerous algorithmic developments. Probably you’d need a witness protection / isolation setup for the researchers who’d been exposed to the secret. You’d need to start planning around how soon others might stumble onto the same discovery, what other similar discoveries might be out there to be found, what government actions should be taken, etc.
I know this sounds like an implausible scenario to most people currently, but I think it’s not something ruled out as physically impossible, and is worth having a what-if plan for.
Some evidence which I claim points in the direction of the plausibility of the feasibility of a highly parallelized search for algorithmic improvements: https://arxiv.org/abs/2403.17844
This paper indicates that even very small scale experiments on novel architectures can give a good approximation of how well those novel architectures will perform at scale. This means that a large search which does many small scale tests could be expected to find promising leads.
Glad to see my personal hobbyhorse got a mention!
Overall this is great, and I have just one concern about a possible what-if case that isn’t covered.
Model development secrets being disproportionately important.
I know, it’s a really tricky thing to deal with if this becomes the case. But I think it’s worth having a ‘what if’ plan in place in case this suddenly happens. Imagine a lab is working on their research and some researcher stumbles across an algorithmic innovation which makes a million-fold improvement in training efficiency. The peak capability level of the model that can now be trained for a few thousand dollars is now on par with their leading multi-billion dollar frontier model. Furthermore, this peak capability level is sufficient for the 3x speedup in AI R&D that you specified as being a threshold of high concern. Under such a circumstance, this secret becomes even more dangerous and valuable than the multi-billion-dollar-training-cost frontier model weights. Seems like having a plan in place for what if this happens would be wise.
Something like: don’t tell your co-workers, report to so-and-so specific person whose job it is to handle evaluating and reporting up-the-chain about possible dangerous algorithmic developments. Probably you’d need a witness protection / isolation setup for the researchers who’d been exposed to the secret. You’d need to start planning around how soon others might stumble onto the same discovery, what other similar discoveries might be out there to be found, what government actions should be taken, etc.
I know this sounds like an implausible scenario to most people currently, but I think it’s not something ruled out as physically impossible, and is worth having a what-if plan for.
Some evidence which I claim points in the direction of the plausibility of the feasibility of a highly parallelized search for algorithmic improvements: https://arxiv.org/abs/2403.17844
This paper indicates that even very small scale experiments on novel architectures can give a good approximation of how well those novel architectures will perform at scale. This means that a large search which does many small scale tests could be expected to find promising leads.