Fwiw, I think the people who made gpt were surprised by its capabilities. I’ve been making smaller language models professionally for five years, and I know far more about them than the average person, and I don’t really understand how chatgpt does some of the stuff it does. Ultimately I think it has to be a fact about language being systematic rather than anything special about chatgpt itself. I.e., the problem of fluently using language is just easier than we (like to) think, not that chatgpt is magic.
There are scaling laws papers, but they just predict how low the loss will go. No one has a very good idea of what capabilities emerge at a given loss level, but we do know from past experience that pretty much fundamentally new stuff does emerge as loss goes down.
If researchers who knew their shit were surprised when these capabilities emerged, and no new discoveries on functionality happened afterwards, and no fundamental theoretical shift in understanding happened afterwards, I would indeed suspect that even those who are competent do not actually understand even now. Surprises of this caliber are an indication that new scientific theories are needed, not something that should just be explained away as somehow overlooked in existing theories despite many bright minds having a strong incentive to look for it.
Fwiw, I think the people who made gpt were surprised by its capabilities. I’ve been making smaller language models professionally for five years, and I know far more about them than the average person, and I don’t really understand how chatgpt does some of the stuff it does. Ultimately I think it has to be a fact about language being systematic rather than anything special about chatgpt itself. I.e., the problem of fluently using language is just easier than we (like to) think, not that chatgpt is magic.
There are scaling laws papers, but they just predict how low the loss will go. No one has a very good idea of what capabilities emerge at a given loss level, but we do know from past experience that pretty much fundamentally new stuff does emerge as loss goes down.
See here for scaling laws stuff: https://www.lesswrong.com/tag/scaling-laws
Thank you for sharing that.
If researchers who knew their shit were surprised when these capabilities emerged, and no new discoveries on functionality happened afterwards, and no fundamental theoretical shift in understanding happened afterwards, I would indeed suspect that even those who are competent do not actually understand even now. Surprises of this caliber are an indication that new scientific theories are needed, not something that should just be explained away as somehow overlooked in existing theories despite many bright minds having a strong incentive to look for it.