Our results suggest we won’t be caught off-guard by highly capable models that were trained for years in secret, which seems strategically relevant for those concerned with risks
We looked whether there was any ‘alpha’ in these results by investigating the training durations of ML training runs, and found that models are typically trained for durations that aren’t far off from what our analysis suggests might be optimal (see a snapshot of the data here)
It independently seems highly likely that large training runs would already be optimized in this dimension, which further suggests that this has little to no action-relevance for advancing the frontier
Good question. Some thoughts on why do this:
Our results suggest we won’t be caught off-guard by highly capable models that were trained for years in secret, which seems strategically relevant for those concerned with risks
We looked whether there was any ‘alpha’ in these results by investigating the training durations of ML training runs, and found that models are typically trained for durations that aren’t far off from what our analysis suggests might be optimal (see a snapshot of the data here)
It independently seems highly likely that large training runs would already be optimized in this dimension, which further suggests that this has little to no action-relevance for advancing the frontier
Thanks for thinking it over, and I agree with your assessment that this is better public knowlege than private :)
Thanks Tamay!