Wolfram has written an attempt and https://twitter.com/AndrewCurran_ is drafting one too. My own attempt from back in 2020, which I see little need to revise, is https://gwern.net/scaling-hypothesis#why-does-pretraining-work
Wolfram has written an attempt and https://twitter.com/AndrewCurran_ is drafting one too. My own attempt from back in 2020, which I see little need to revise, is https://gwern.net/scaling-hypothesis#why-does-pretraining-work