I think this should somewhat update people away from “we can prevent model weights from being stolen by limiting the outgoing bandwidth from the data center”, if that protection is assuming that model weights are very big and [the dangerous part] can’t be made smaller.
I’d also bet that, even if Deep Seek turns out to be somehow “fake” (optimized for benchmarks in some way) (not that this currently seems like the situation), some other way of making at least the dangerous[1] parts of a model much smaller[2] will be found and known[3] publicly.
If someone is stealing a model, they probably care about “dangerous” capabilities like ML engineering and the ability to autonomously act in the world, but not about “not dangerous” capabilities like memorizing Harry Potter and all its fan fictions. If you’re interested to bet with me, I’d probably let you judge what is and isn’t dangerous. Also, as far as I can tell, Deep Seek is much smaller without giving up a lot of knowledge, so the claim I’m making in this bet is even weaker
This sets a lower bar for the secret capabilities a nation state might have if they’re trying to steal model weights that are defended this way. So again, I expect the attack we’d actually see against such a plan to be even stronger
Yes, but I think the larger update is that recent models from OpenAI are likely quite small and inference time compute usage creates more an incentive for small models. It seems likely that (e.g.) o1-mini is quite small given that it generates at 220 tokens per second(!), perhaps <30 billion active parameters based on the link from epoch given earlier. I’d guess (idk) 100 billion params. Likely something similar holds for o3-mini.
(I think the update from deepseek in particular might be smaller than you think as export controls create an artifical incentive for smaller models.)
I think this should somewhat update people away from “we can prevent model weights from being stolen by limiting the outgoing bandwidth from the data center”, if that protection is assuming that model weights are very big and [the dangerous part] can’t be made smaller.
I’d also bet that, even if Deep Seek turns out to be somehow “fake” (optimized for benchmarks in some way) (not that this currently seems like the situation), some other way of making at least the dangerous[1] parts of a model much smaller[2] will be found and known[3] publicly.
If someone is stealing a model, they probably care about “dangerous” capabilities like ML engineering and the ability to autonomously act in the world, but not about “not dangerous” capabilities like memorizing Harry Potter and all its fan fictions. If you’re interested to bet with me, I’d probably let you judge what is and isn’t dangerous. Also, as far as I can tell, Deep Seek is much smaller without giving up a lot of knowledge, so the claim I’m making in this bet is even weaker
At least 10x smaller, but I’d also bet on 100x at some odds
This sets a lower bar for the secret capabilities a nation state might have if they’re trying to steal model weights that are defended this way. So again, I expect the attack we’d actually see against such a plan to be even stronger
Yes, but I think the larger update is that recent models from OpenAI are likely quite small and inference time compute usage creates more an incentive for small models. It seems likely that (e.g.) o1-mini is quite small given that it generates at 220 tokens per second(!), perhaps <30 billion active parameters based on the link from epoch given earlier. I’d guess (idk) 100 billion params. Likely something similar holds for o3-mini.
(I think the update from deepseek in particular might be smaller than you think as export controls create an artifical incentive for smaller models.)