It does sound like a lot—that’s 5 OOMs to reach human learning efficiency and then 8 OOMs more. But when we BOTECed the sources of algorithmic efficiency gain on top of the human brain, it seemed like you could easily get more than 8. But agreed it seems like a lot. Though we are talking about ultimate physical limits here!
Interesting re the early years. So you’d accept that learning from 5⁄6 could be OOMs more efficient, but would deny that the early years could be improved?
Though you’re not really speaking to the ‘undertrained’ point, which is about the number of params vs data points
Agreed there’s an ultimate cap on software improvements—the worry is that it’s very far away!