I expect that the media firestorm around GPT-3 made it significantly easier to raise the capital + support within Google Brain to train PaLM.
Wouldn’t surprise me if this was true, but I agree with you that it’s possible the ship has already sailed on LLMs. I think this is more so the case if you have a novel insight about what paths are more promising to AGI (similar to the scaling hypothesis in 2018)---getting ~everyone to adopt that insight would significantly advance timelines, though I’d argue that publishing it (such that only the labs explicitly aiming at AGI like OpenAI and Deepmind adopt it) is not clearly less bad than hyping it up.
Gopher is a good example of not really seeing much fanfare, I think? (Though I don’t spend much time on ML Twitter, so maybe there was loads lol)
Surely this is because it didn’t say anything except “Deepmind is also now in the LLM game”, which wasn’t surprising given Geoff Irving left OpenAI for Deepmind? There weren’t significant groundbreaking techniques used to train Gopher as far as I can remember.
Chinchilla, on the other hand, did see a ton of fanfare.
Ah, my key argument here is that most conceptual work is bad because of lacking good empirical examples, grounding and feedback loops, and that if we were closer to AGI we could have this.
Cool. I agree with you that conceptual work is bad in part because of a lack of good examples/grounding/feedback loops, though I think this can be overcome with clever toy problem design and analogies to current problems (that you can then get the examples/grounding/feedback loops from). E.g. surely we can test toy versions of shard theory claims using the small algorithmic neural networks we’re able to fully reverse engineer.
Wouldn’t surprise me if this was true, but I agree with you that it’s possible the ship has already sailed on LLMs. I think this is more so the case if you have a novel insight about what paths are more promising to AGI (similar to the scaling hypothesis in 2018)---getting ~everyone to adopt that insight would significantly advance timelines, though I’d argue that publishing it (such that only the labs explicitly aiming at AGI like OpenAI and Deepmind adopt it) is not clearly less bad than hyping it up.
Surely this is because it didn’t say anything except “Deepmind is also now in the LLM game”, which wasn’t surprising given Geoff Irving left OpenAI for Deepmind? There weren’t significant groundbreaking techniques used to train Gopher as far as I can remember.
Chinchilla, on the other hand, did see a ton of fanfare.
Cool. I agree with you that conceptual work is bad in part because of a lack of good examples/grounding/feedback loops, though I think this can be overcome with clever toy problem design and analogies to current problems (that you can then get the examples/grounding/feedback loops from). E.g. surely we can test toy versions of shard theory claims using the small algorithmic neural networks we’re able to fully reverse engineer.