Playing catch-up is way easier than pushing the frontier of LLM research. One is about guessing which path others took, the other one is about carving a path among all the possible ideas that could work.
If China stopped having access to US LLM secrets and had to push the LLM frontier rather than playing catch up, how slower would it be at doing so?
My guess is at least >2x and probably more but I’d be curious to get takes.
Since the scaling experiment is not yet done, it remains possible that long-horizon agency is just a matter of scale even with current architectures, no additional research necessary. In which case additional research helps save on compute and shape the AIs, but doesn’t influence ability to reach the changeover point, when the LLMs take the baton and go on doing any further research on their own.
Distributed training might be one key milestone that’s not yet commoditized, making individual datacenters with outrageous local energy requirements unnecessary. And of course there’s the issue of access to large quantities of hardware.
They are pushing the frontier (https://arxiv.org/abs/2406.07394), but it’s hard to say where they would be without llamas. I don’t think they’d be much far behind. They have gpt-4 class models as is and also don’t care about copyright restrictions when training models. (Arguably they have better image models as a result)
Playing catch-up is way easier than pushing the frontier of LLM research. One is about guessing which path others took, the other one is about carving a path among all the possible ideas that could work.
If China stopped having access to US LLM secrets and had to push the LLM frontier rather than playing catch up, how slower would it be at doing so?
My guess is at least >2x and probably more but I’d be curious to get takes.
Since the scaling experiment is not yet done, it remains possible that long-horizon agency is just a matter of scale even with current architectures, no additional research necessary. In which case additional research helps save on compute and shape the AIs, but doesn’t influence ability to reach the changeover point, when the LLMs take the baton and go on doing any further research on their own.
Distributed training might be one key milestone that’s not yet commoditized, making individual datacenters with outrageous local energy requirements unnecessary. And of course there’s the issue of access to large quantities of hardware.
They are pushing the frontier (https://arxiv.org/abs/2406.07394), but it’s hard to say where they would be without llamas. I don’t think they’d be much far behind. They have gpt-4 class models as is and also don’t care about copyright restrictions when training models. (Arguably they have better image models as a result)