Interesting, many of these things seem important as evaluation issues, even as I don’t think they are important algorithmic bottlenecks between now and superintelligence (because either they quickly fall by default, or else if only great effort gets them to work then they still won’t crucially help). So there are blind spots in evaluating open weights models that get more impactful with greater scale of pretraining, and less impactful with more algorithmic progress, which enables evaluations to see better.
Compute and funding could be important bottlenecks for multiple years, if $30 billion training runs don’t cut it. The fact that o1 is possible already (if not o1 itself) might actually be important for timelines, but indirectly, by enabling more funding for compute that cuts through the hypothetical thresholds of capability that would otherwise require significantly better hardware or algorithms.
Interesting, many of these things seem important as evaluation issues, even as I don’t think they are important algorithmic bottlenecks between now and superintelligence (because either they quickly fall by default, or else if only great effort gets them to work then they still won’t crucially help). So there are blind spots in evaluating open weights models that get more impactful with greater scale of pretraining, and less impactful with more algorithmic progress, which enables evaluations to see better.
Compute and funding could be important bottlenecks for multiple years, if $30 billion training runs don’t cut it. The fact that o1 is possible already (if not o1 itself) might actually be important for timelines, but indirectly, by enabling more funding for compute that cuts through the hypothetical thresholds of capability that would otherwise require significantly better hardware or algorithms.