I liked various parts of this post and agree that this is an under-discussed but important topic. I found it a little tricky to understand the information security section. Here are a few disagreements (or possibly just confusions).
A single project might motivate more serious attacks, which are harder to defend against.
It might also motivate earlier attacks, such that the single project would have less total time to get security measures into place.\
In general, I think it’s more natural to think about how expensive an attack will be and how harmful that attack would be if it were successful, rather than reasoning about when an attack will happen.
Here I am imagining that you think a single project could motivate earlier attacks because overall US adversaries are more concerned about the US’s AI ambitions, or because AI progress is faster and it’s more useful to steal a model. It’s worth noting that stealing AI models whilst progress is mostly due to scaling and models are not directly dangerous or automating ai r&d doesn’t seem particularly harmful (in that it’s unlikely to directly cause a GCR or significantly speed up the stealer’s AGI project). So overall, I’m not sure whether you think the security situation is better or worse in the case of earlier attacks.
A single project could have *more *attack surface, if it’s sufficiently big. Some attack surface scales with the number of projects (like the number of security systems), but other kinds of attack surface scale with total size (like the number of people or buildings). If a single project were sufficiently bigger than the sum of the counterfactual multiple projects, it could have more attack surface and so be less infosecure.
I don’t really understand your model here. I think naively you should be comparing a central US project to multiple AI labs projects. My current impression is that for a fixed amount of total AI lab resources the attack surface will likely decrease (e.g. only need to verify one set of libraries are secure, rather than 3 somewhat different sets of libraries). If you are comparing just one frontier lab to a large single project than I agree attack surface could be larger but that seems like the wrong comparison.
I don’t understand the logic of step 2 of the following argument.
If it’s harder to steal the weights, fewer actors will be able to do so.
China is one of the most resourced and competent actors, and would have even stronger incentives to steal the weights than other actors (because of race dynamics).
So it’s more likely that centralising reduces proliferation risk, and less likely that it reduces the chance of China stealing the weights.\
I think that China has stronger incentives than many other nations to steal the model (because it is politically and financially cheaper for them) but making it harder to steal the weights still makes it more costly for China to steal the weights and therefore they are less incentivised. You seem to be saying that it makes them more incentivised to steal the weights but I don’t quite follow why.
I liked various parts of this post and agree that this is an under-discussed but important topic. I found it a little tricky to understand the information security section. Here are a few disagreements (or possibly just confusions).
In general, I think it’s more natural to think about how expensive an attack will be and how harmful that attack would be if it were successful, rather than reasoning about when an attack will happen.
Here I am imagining that you think a single project could motivate earlier attacks because overall US adversaries are more concerned about the US’s AI ambitions, or because AI progress is faster and it’s more useful to steal a model. It’s worth noting that stealing AI models whilst progress is mostly due to scaling and models are not directly dangerous or automating ai r&d doesn’t seem particularly harmful (in that it’s unlikely to directly cause a GCR or significantly speed up the stealer’s AGI project). So overall, I’m not sure whether you think the security situation is better or worse in the case of earlier attacks.
I don’t really understand your model here. I think naively you should be comparing a central US project to multiple AI labs projects. My current impression is that for a fixed amount of total AI lab resources the attack surface will likely decrease (e.g. only need to verify one set of libraries are secure, rather than 3 somewhat different sets of libraries). If you are comparing just one frontier lab to a large single project than I agree attack surface could be larger but that seems like the wrong comparison.
I don’t understand the logic of step 2 of the following argument.
I think that China has stronger incentives than many other nations to steal the model (because it is politically and financially cheaper for them) but making it harder to steal the weights still makes it more costly for China to steal the weights and therefore they are less incentivised. You seem to be saying that it makes them more incentivised to steal the weights but I don’t quite follow why.