It’s interesting how Microsoft and NVIDIA are plugging EleutherAI and open source work in general. While they don’t reference EleutherAI by name, the Pile dataset used as the basis for their training data and the LM Evaluation Harness mentioned in the post are both open source efforts by EleutherAI. EleutherAI, in return, is using the Megatron-DS codebase as the core of their GPT-NeoX model architecture.
I think that this is notable because it’s the first time we’ve really seen powerful AI research orgs sharing infra like this. Typically everyone wants to do everything bespoke and make their work all on their own. This is good for branding but obviously a lot more work.
I wonder if MSFT and NVIDIA tried to make a better dataset than the Pile on their own and failed.
I think that this is notable because it’s the first time we’ve really seen powerful AI research orgs sharing infra like this. Typically everyone wants to do everything bespoke and make their work all on their own. This is good for branding but obviously a lot more work.
Nvidia wants to sell GPUs, and that’s pretty much it; any services they sell are tightly coupled to the GPUs, they don’t sell smartphones or banner ads. And Microsoft wants to sell MS Azure, and to a lesser extent, business SaaS, and while it has many fingers in many pies, those tails do not wag the dog. NV/MS releasing tooling like DeepSpeed, and being pragmatic about using The Pile since it exists (instead of spending scarce engineer time on making one’s own just to have their own), is consistent with that.
In contrast, Facebook, Google, Apple, AliBaba, Baidu—all of these sell different things, typically far more integrated into a service/website/platform, like smartphones vertically integrated from the web advertising down to the NN ASICs on their in-house smartphones. Google may be unusually open in terms of releasing research, but they still won’t release the actual models trained on JFT-300M/B or web scrapes like their ALIGN, or models touching on the core business vitals like advertising, or their best models like LaMDA* or MUM or Pathways. Even academics ‘sell’ very different things than happy endusers on Nvidia GPUs / MS cloud VMs: prestige, citations, novelty, secret sauces, moral high grounds. Not necessarily open data and working code.
* The split incentives lead to some strange behavior, like the current situation where there’s already like 6 notable Google-authored papers on LaMDA revealing fascinating capabilities like general text style transfer… all of which won’t use its name and only refer to it as “a large language model” or something. (Sometimes they’ll generously specify the model in question is O(100b) parameters.)
It’s interesting how Microsoft and NVIDIA are plugging EleutherAI and open source work in general. While they don’t reference EleutherAI by name, the Pile dataset used as the basis for their training data and the LM Evaluation Harness mentioned in the post are both open source efforts by EleutherAI. EleutherAI, in return, is using the Megatron-DS codebase as the core of their GPT-NeoX model architecture.
I think that this is notable because it’s the first time we’ve really seen powerful AI research orgs sharing infra like this. Typically everyone wants to do everything bespoke and make their work all on their own. This is good for branding but obviously a lot more work.
I wonder if MSFT and NVIDIA tried to make a better dataset than the Pile on their own and failed.
It may just be the incentives. “Commoditize your complement”.
Nvidia wants to sell GPUs, and that’s pretty much it; any services they sell are tightly coupled to the GPUs, they don’t sell smartphones or banner ads. And Microsoft wants to sell MS Azure, and to a lesser extent, business SaaS, and while it has many fingers in many pies, those tails do not wag the dog. NV/MS releasing tooling like DeepSpeed, and being pragmatic about using The Pile since it exists (instead of spending scarce engineer time on making one’s own just to have their own), is consistent with that.
In contrast, Facebook, Google, Apple, AliBaba, Baidu—all of these sell different things, typically far more integrated into a service/website/platform, like smartphones vertically integrated from the web advertising down to the NN ASICs on their in-house smartphones. Google may be unusually open in terms of releasing research, but they still won’t release the actual models trained on JFT-300M/B or web scrapes like their ALIGN, or models touching on the core business vitals like advertising, or their best models like LaMDA* or MUM or Pathways. Even academics ‘sell’ very different things than happy endusers on Nvidia GPUs / MS cloud VMs: prestige, citations, novelty, secret sauces, moral high grounds. Not necessarily open data and working code.
* The split incentives lead to some strange behavior, like the current situation where there’s already like 6 notable Google-authored papers on LaMDA revealing fascinating capabilities like general text style transfer… all of which won’t use its name and only refer to it as “a large language model” or something. (Sometimes they’ll generously specify the model in question is O(100b) parameters.)