Both DM/GB have moved enormously towards scaling since May 2020, and there are a number of enthusiastic scaling proponents inside both in addition to the obvious output of things like Chinchilla or PaLM. (Good for them, not that I really expected otherwise given that stuff just kept happening and happening after GPT-3.) This happened fairly quickly for DM (given when Gopher was apparently started), and maybe somewhat slower for GB despite Dean’s history & enthusiasm. (I still think MoEs were a distraction.) I don’t know enough about the internal dynamics to say if they are fully scale-pilled, but scaling has worked so well, even in crazy applications like dropping language models into robotics planning (SayCan), that critics are in pell-mell retreat and people are getting away with publishing manifestos like “reward is enough” or openly saying on Twitter “scaling is all you need”. I expect that top-down organizational constraints are probably now a bigger deal: I’m far from the first person to note that DM/GB seem unable to ship (publicly visible) products and researchers keep fleeing for startups where they can be more like OA in actually shipping.
FAIR puzzles me because FAIR researchers are certainly not stupid or blind, FB continues to make large investments in hardware like their new GPU cluster, and the most interesting FAIR research is strongly scaling flavored, like their unsupervised work on audio/video, so you’d think they’d’ve caught up. But FB is also experiencing heavy weather while Zuckerberg seems to be aiming it all at ‘metaverse’ applications (which leads away from DRL) and further, FAIR has recently been somehow broken up & everyone reorganized (?). Meanwhile, of course, Yann LeCun continues saying things like ‘general intelligence doesn’t exist’, scoffing at scaling, and proposing elaborately engineered modular neuroscience-based AGI paradigms. So I guess it looks like they’re grudingly backing their way into scaling work simply because they are forced to if they want any results worth publishing or systems which can meet Zuckerberg’s Five Year Plans, but one could not call FAIR scaling-pilled. Scaling enthusiasts probably feel chilled about proposing any explicit scaling research or mentioning the reasons for it being important, which will shut down anything daring.
Both DM/GB have moved enormously towards scaling since May 2020, and there are a number of enthusiastic scaling proponents inside both in addition to the obvious output of things like Chinchilla or PaLM. (Good for them, not that I really expected otherwise given that stuff just kept happening and happening after GPT-3.) This happened fairly quickly for DM (given when Gopher was apparently started), and maybe somewhat slower for GB despite Dean’s history & enthusiasm. (I still think MoEs were a distraction.) I don’t know enough about the internal dynamics to say if they are fully scale-pilled, but scaling has worked so well, even in crazy applications like dropping language models into robotics planning (SayCan), that critics are in pell-mell retreat and people are getting away with publishing manifestos like “reward is enough” or openly saying on Twitter “scaling is all you need”. I expect that top-down organizational constraints are probably now a bigger deal: I’m far from the first person to note that DM/GB seem unable to ship (publicly visible) products and researchers keep fleeing for startups where they can be more like OA in actually shipping.
FAIR puzzles me because FAIR researchers are certainly not stupid or blind, FB continues to make large investments in hardware like their new GPU cluster, and the most interesting FAIR research is strongly scaling flavored, like their unsupervised work on audio/video, so you’d think they’d’ve caught up. But FB is also experiencing heavy weather while Zuckerberg seems to be aiming it all at ‘metaverse’ applications (which leads away from DRL) and further, FAIR has recently been somehow broken up & everyone reorganized (?). Meanwhile, of course, Yann LeCun continues saying things like ‘general intelligence doesn’t exist’, scoffing at scaling, and proposing elaborately engineered modular neuroscience-based AGI paradigms. So I guess it looks like they’re grudingly backing their way into scaling work simply because they are forced to if they want any results worth publishing or systems which can meet Zuckerberg’s Five Year Plans, but one could not call FAIR scaling-pilled. Scaling enthusiasts probably feel chilled about proposing any explicit scaling research or mentioning the reasons for it being important, which will shut down anything daring.