I think if I were you, then, I would have focused more on how we already knew you could scale up transformers and get more and more impressive results. I had heard of (and maybe skimmed) some of those other papers, so I was already somewhat confident that you could scale up transformers and get more impressive results… but I didn’t quite believe it, deep down. Deep down I thought that probably there was going to be some catch or limitation I didn’t know of yet that would prevent this easy scaling from going on much farther, or leading to anything interestingly new. After all, speculation is easy; making predictions and then later confirming them is hard. Well, now it’s confirmed. This doesn’t change my credences that much (maybe they go from 60% to 90% for the “can we scale up langauge models” and from like 20% to 30% for “are within 5 years of some sort of transformative AI”) but it’s changed my gut.
I think if I were you, then, I would have focused more on how we already knew you could scale up transformers and get more and more impressive results. I had heard of (and maybe skimmed) some of those other papers, so I was already somewhat confident that you could scale up transformers and get more impressive results… but I didn’t quite believe it, deep down. Deep down I thought that probably there was going to be some catch or limitation I didn’t know of yet that would prevent this easy scaling from going on much farther, or leading to anything interestingly new. After all, speculation is easy; making predictions and then later confirming them is hard. Well, now it’s confirmed. This doesn’t change my credences that much (maybe they go from 60% to 90% for the “can we scale up langauge models” and from like 20% to 30% for “are within 5 years of some sort of transformative AI”) but it’s changed my gut.