No, it’s not, because we have a pretty good idea of how transistors work and in fact someone needed to directly anticipate how they might work in order to engineer them. The “unknown” part about the deep learning models is not the network layer or the software that uses the inscrutable matrices, it’s how the model is getting the answers that it does.
I think he’s referring to the understanding of the precise mechanics of how transistors worked, or why the particular first working prototypes functioned while all the others didn’t. Just from skimming https://en.wikipedia.org/wiki/History_of_the_transistor
That’s the current understanding for llms—people do know at a high level what an llm does and why it works, just like there were theories decades before working transistors on their function. But the details of why this system works but 50 other things tried didn’t is not known.
No, it’s not, because we have a pretty good idea of how transistors work and in fact someone needed to directly anticipate how they might work in order to engineer them. The “unknown” part about the deep learning models is not the network layer or the software that uses the inscrutable matrices, it’s how the model is getting the answers that it does.
Yes, it is, because it took like five years to understand minority-carrier injection.
I think he’s referring to the understanding of the precise mechanics of how transistors worked, or why the particular first working prototypes functioned while all the others didn’t. Just from skimming https://en.wikipedia.org/wiki/History_of_the_transistor
That’s the current understanding for llms—people do know at a high level what an llm does and why it works, just like there were theories decades before working transistors on their function. But the details of why this system works but 50 other things tried didn’t is not known.