Much of your criticism is of the form “This is just a rehash of the GPT-2 paper; it doesn’t teach us anything new.” My reaction to this paper was: “In the GPT-2 paper, they made a prediction: that scaling up the same architecture would lead to more and more impressive and general capabilities. Now they’ve confirmed that prediction.”
I feel the need at this point to add that I upvoted this post, even though I disagree with much of it, because this sort of discussion is exactly the sort of thing I like to see on LW, and I thought the OP was a nice detailed criticism of an important paper (and more importantly, criticism of the hype that many people including myself may be feeling after reading it). Again, I ultimately am still hyped, but my hype would be hollow if I didn’t welcome criticisms of it!
Much of your criticism is of the form “This is just a rehash of the GPT-2 paper; it doesn’t teach us anything new.” My reaction to this paper was: “In the GPT-2 paper, they made a prediction: that scaling up the same architecture would lead to more and more impressive and general capabilities. Now they’ve confirmed that prediction.”
I feel the need at this point to add that I upvoted this post, even though I disagree with much of it, because this sort of discussion is exactly the sort of thing I like to see on LW, and I thought the OP was a nice detailed criticism of an important paper (and more importantly, criticism of the hype that many people including myself may be feeling after reading it). Again, I ultimately am still hyped, but my hype would be hollow if I didn’t welcome criticisms of it!