Oh, I have little need for the word “plan,” but it’s more convenient than various circumlocutions. Whatever it is that I’ve been calling a plan is smeared over those 175B weights and, as such, is perfectly accessible to next-token myopia. (Still, check out this tweet stream by Charles Wang.)
It’s just that, unless you’ve got some sophistication – and I’m slowly moving in that direction – saying that transformers work by next-token prediction is about as informative as saying that a laptop works by shuffling data and instructions back and forth between the processor and memory. Both statements are true, but not very informative.
And when “next-token-prediction” appears in the vicinity of “stochastic parrots” or “auto-complete on steroids,” then we’ve got trouble. In that context the typical reader of, say The New York Times or The Atlantic, is likely to think of someone flipping coins or of a bunch of monkey’s banging away on typewriters. Or, maybe they’ll think of someone throwing darts at a dictionary or reaching blindly into a bag full of words, which aren’t very useful either.
Of course, here in this forum, things are different. Which is why I posted that piece here. The discussion has helped me a lot. But it’s going to take a lot of work to figure out how to educate the general reader.
Oh, I have little need for the word “plan,” but it’s more convenient than various circumlocutions. Whatever it is that I’ve been calling a plan is smeared over those 175B weights and, as such, is perfectly accessible to next-token myopia. (Still, check out this tweet stream by Charles Wang.)
It’s just that, unless you’ve got some sophistication – and I’m slowly moving in that direction – saying that transformers work by next-token prediction is about as informative as saying that a laptop works by shuffling data and instructions back and forth between the processor and memory. Both statements are true, but not very informative.
And when “next-token-prediction” appears in the vicinity of “stochastic parrots” or “auto-complete on steroids,” then we’ve got trouble. In that context the typical reader of, say The New York Times or The Atlantic, is likely to think of someone flipping coins or of a bunch of monkey’s banging away on typewriters. Or, maybe they’ll think of someone throwing darts at a dictionary or reaching blindly into a bag full of words, which aren’t very useful either.
Of course, here in this forum, things are different. Which is why I posted that piece here. The discussion has helped me a lot. But it’s going to take a lot of work to figure out how to educate the general reader.
Thanks for the comment.