Epistemic status: Iād give >10% on Metaculus resolving the following as conventional wisdom[1] in 2026.
Autoregressive-modeling-of-human-language capabilities are well-behaved, scaling laws can help us predict what happens, interpretability methods developed on smaller models scale up to larger ones, ā¦
Models-learning-from-themselves have runaway potential, how a model changes after [more training /ā architecture changes /ā training setup modifications] is harder to predict than in models trained on 2022 datasets.
Replacing human-generated data with model-generated data was a mistake[2].
In the sense of x-safety. I have no confident insight either way on how abstaining from very large human-generated datasets influences capabilities long-term. If someone has, please refrain from discussing that publicly, of course.
Epistemic status: Iād give >10% on Metaculus resolving the following as conventional wisdom[1] in 2026.
Autoregressive-modeling-of-human-language capabilities are well-behaved, scaling laws can help us predict what happens, interpretability methods developed on smaller models scale up to larger ones, ā¦
Models-learning-from-themselves have runaway potential, how a model changes after [more training /ā architecture changes /ā training setup modifications] is harder to predict than in models trained on 2022 datasets.
Replacing human-generated data with model-generated data was a mistake[2].
In the sense that e.g. chain of thought improves capabilities is conventional wisdom in 2022.
In the sense of x-safety. I have no confident insight either way on how abstaining from very large human-generated datasets influences capabilities long-term. If someone has, please refrain from discussing that publicly, of course.