I agree that this is a somewhat dated post. Janus has said similarly and I’ve encouraged them to edit the intro to say “yall shouldn’t have been impressed by this” or something. with that said, some very weak defenses of a couple of specific things:
having a “shoggoth”
the way to ground that reasonably is that the shoggoth is the hypersurfaces of decision boundary enclosed volumes. it’s mainly useful as a metaphor if it works as a way to translate into english the very basic idea that neural networks are function approximators. a lot of metaphorical terms are in my view attempts (which generally don’t succeed, especially, it seems, for you) to convey that neural networks are, fundamentally, just adjustable high dimensional kaleidoscopes.
it isn’t contentless
it’s not trying to be highly contentful, it’s trying to clarify a bunch of people’s wrong intuitions about the very basics of what is even happening. If you already grok how taking the derivative of cross entropy of two sequences requires a language model to approximate a function which compresses towards the data’s entropy floor, then the idea that the model “learns to simulate” is far too vague and inspecific. but if you didn’t already grok why that math is what we use to define how well the model is performing at its task, then it might not be obvious what that task is, and calling it a “simulator” helps clarify the task.
that can often give a false sense of understanding.
I agree that this is a somewhat dated post. Janus has said similarly and I’ve encouraged them to edit the intro to say “yall shouldn’t have been impressed by this” or something. with that said, some very weak defenses of a couple of specific things:
the way to ground that reasonably is that the shoggoth is the hypersurfaces of decision boundary enclosed volumes. it’s mainly useful as a metaphor if it works as a way to translate into english the very basic idea that neural networks are function approximators. a lot of metaphorical terms are in my view attempts (which generally don’t succeed, especially, it seems, for you) to convey that neural networks are, fundamentally, just adjustable high dimensional kaleidoscopes.
it’s not trying to be highly contentful, it’s trying to clarify a bunch of people’s wrong intuitions about the very basics of what is even happening. If you already grok how taking the derivative of cross entropy of two sequences requires a language model to approximate a function which compresses towards the data’s entropy floor, then the idea that the model “learns to simulate” is far too vague and inspecific. but if you didn’t already grok why that math is what we use to define how well the model is performing at its task, then it might not be obvious what that task is, and calling it a “simulator” helps clarify the task.
yeah, agreed.