Right. I should’ve emphasized the time-lag component. I guess I’ve been taking that for granted since I think primarily in terms of LLM cognitive architectures, not LLMs, as the danger.
The existence of other low-hanging fruit makes that situation worse. Even once o1-like post-training is commoditized and part of testing, there will be other cognitive capabilities with the potential to add dangerous capabilities.
In particular, the addition of useful episodic memory or other forms of continuous learning may have a nonlinear contribution to capabilities. Such learning already exists in at least two forms. Both of those and others are likely being improved as we speak.
Depending on how far inference scaling laws go, the situation might be worse still. Picture LLama-4-o1 scaffolds that anyone can run for indefinite amounts of time (as long as they have the money/compute) to autonomously do ML research on various ways to improve Llama-4-o1 and open-weights descendants, to potentially be again appliable to autonomous ML research. Fortunately, lack of access to enough quantities of compute for pretraining the next-gen model is probably a barrier for most actors, but this still seems like a pretty (increasingly, with every open-weights improvement) scary situation to be in.
Excellent point that these capabilities will contribute to more advancements that will compound rates of progress.
Worse yet, I have it on good authority that a technique much like o1 is thought to use, can be done at very low cost and low human effort on open-source models. It’s unclear how effective it is at those low levels of cost and effort, but definitely useful, so likely scalable to intermediate levels of project.
Here’s hoping that the terrifying acceleration of proliferation and progress is balanced by the inherent ease of aligning LLM agents, and by the relatively slow takeoff speeds, giving us at least a couple of years to get our shit halfway together, including with the automated alignment techniques you focus on.
Interesting, many of these things seem important as evaluation issues, even as I don’t think they are important algorithmic bottlenecks between now and superintelligence (because either they quickly fall by default, or else if only great effort gets them to work then they still won’t crucially help). So there are blind spots in evaluating open weights models that get more impactful with greater scale of pretraining, and less impactful with more algorithmic progress, which enables evaluations to see better.
Compute and funding could be important bottlenecks for multiple years, if $30 billion training runs don’t cut it. The fact that o1 is possible already (if not o1 itself) might actually be important for timelines, but indirectly, by enabling more funding for compute that cuts through the hypothetical thresholds of capability that would otherwise require significantly better hardware or algorithms.
Right. I should’ve emphasized the time-lag component. I guess I’ve been taking that for granted since I think primarily in terms of LLM cognitive architectures, not LLMs, as the danger.
The existence of other low-hanging fruit makes that situation worse. Even once o1-like post-training is commoditized and part of testing, there will be other cognitive capabilities with the potential to add dangerous capabilities.
In particular, the addition of useful episodic memory or other forms of continuous learning may have a nonlinear contribution to capabilities. Such learning already exists in at least two forms. Both of those and others are likely being improved as we speak.
Depending on how far inference scaling laws go, the situation might be worse still. Picture LLama-4-o1 scaffolds that anyone can run for indefinite amounts of time (as long as they have the money/compute) to autonomously do ML research on various ways to improve Llama-4-o1 and open-weights descendants, to potentially be again appliable to autonomous ML research. Fortunately, lack of access to enough quantities of compute for pretraining the next-gen model is probably a barrier for most actors, but this still seems like a pretty (increasingly, with every open-weights improvement) scary situation to be in.
Now I’m picturing that, and I don’t like it.
Excellent point that these capabilities will contribute to more advancements that will compound rates of progress.
Worse yet, I have it on good authority that a technique much like o1 is thought to use, can be done at very low cost and low human effort on open-source models. It’s unclear how effective it is at those low levels of cost and effort, but definitely useful, so likely scalable to intermediate levels of project.
Here’s hoping that the terrifying acceleration of proliferation and progress is balanced by the inherent ease of aligning LLM agents, and by the relatively slow takeoff speeds, giving us at least a couple of years to get our shit halfway together, including with the automated alignment techniques you focus on.
Related:
And more explicitly, from GDM’s Frontier Safety Framework, pages 5-6:
Image link is broken.
Interesting, many of these things seem important as evaluation issues, even as I don’t think they are important algorithmic bottlenecks between now and superintelligence (because either they quickly fall by default, or else if only great effort gets them to work then they still won’t crucially help). So there are blind spots in evaluating open weights models that get more impactful with greater scale of pretraining, and less impactful with more algorithmic progress, which enables evaluations to see better.
Compute and funding could be important bottlenecks for multiple years, if $30 billion training runs don’t cut it. The fact that o1 is possible already (if not o1 itself) might actually be important for timelines, but indirectly, by enabling more funding for compute that cuts through the hypothetical thresholds of capability that would otherwise require significantly better hardware or algorithms.