What’s the mechanism you’re thinking of, through which hype does damage?
This ship may have sailed at this point, but to me the main mechanism is getting other actors to pay attention, focus on the most effective kind of capabilities work, and making it more politically feasible to raise support. Eg, I expect that the media firestorm around GPT-3 made it significantly easier to raise the capital + support within Google Brain to train PaLM. Legibly making a ton of money with it falls in a similar category to me.
Gopher is a good example of not really seeing much fanfare, I think? (Though I don’t spend much time on ML Twitter, so maybe there was loads lol)
And I think much conceptual work seems pretty serial to me, and is hard to parallelize due to reasons like “intuitions from the lead researcher are difficult to share” and communications difficulties in general.
Ah, my key argument here is that most conceptual work is bad because of lacking good empirical examples, grounding and feedback loops, and that if we were closer to AGI we could have this.
I agree that risks from learned optimisation is important and didn’t need this, and plausibly feels like a good example of serial work to me.
I expect that the media firestorm around GPT-3 made it significantly easier to raise the capital + support within Google Brain to train PaLM.
Wouldn’t surprise me if this was true, but I agree with you that it’s possible the ship has already sailed on LLMs. I think this is more so the case if you have a novel insight about what paths are more promising to AGI (similar to the scaling hypothesis in 2018)---getting ~everyone to adopt that insight would significantly advance timelines, though I’d argue that publishing it (such that only the labs explicitly aiming at AGI like OpenAI and Deepmind adopt it) is not clearly less bad than hyping it up.
Gopher is a good example of not really seeing much fanfare, I think? (Though I don’t spend much time on ML Twitter, so maybe there was loads lol)
Surely this is because it didn’t say anything except “Deepmind is also now in the LLM game”, which wasn’t surprising given Geoff Irving left OpenAI for Deepmind? There weren’t significant groundbreaking techniques used to train Gopher as far as I can remember.
Chinchilla, on the other hand, did see a ton of fanfare.
Ah, my key argument here is that most conceptual work is bad because of lacking good empirical examples, grounding and feedback loops, and that if we were closer to AGI we could have this.
Cool. I agree with you that conceptual work is bad in part because of a lack of good examples/grounding/feedback loops, though I think this can be overcome with clever toy problem design and analogies to current problems (that you can then get the examples/grounding/feedback loops from). E.g. surely we can test toy versions of shard theory claims using the small algorithmic neural networks we’re able to fully reverse engineer.
This ship may have sailed at this point, but to me the main mechanism is getting other actors to pay attention, focus on the most effective kind of capabilities work, and making it more politically feasible to raise support. Eg, I expect that the media firestorm around GPT-3 made it significantly easier to raise the capital + support within Google Brain to train PaLM. Legibly making a ton of money with it falls in a similar category to me.
Gopher is a good example of not really seeing much fanfare, I think? (Though I don’t spend much time on ML Twitter, so maybe there was loads lol)
Ah, my key argument here is that most conceptual work is bad because of lacking good empirical examples, grounding and feedback loops, and that if we were closer to AGI we could have this.
I agree that risks from learned optimisation is important and didn’t need this, and plausibly feels like a good example of serial work to me.
Wouldn’t surprise me if this was true, but I agree with you that it’s possible the ship has already sailed on LLMs. I think this is more so the case if you have a novel insight about what paths are more promising to AGI (similar to the scaling hypothesis in 2018)---getting ~everyone to adopt that insight would significantly advance timelines, though I’d argue that publishing it (such that only the labs explicitly aiming at AGI like OpenAI and Deepmind adopt it) is not clearly less bad than hyping it up.
Surely this is because it didn’t say anything except “Deepmind is also now in the LLM game”, which wasn’t surprising given Geoff Irving left OpenAI for Deepmind? There weren’t significant groundbreaking techniques used to train Gopher as far as I can remember.
Chinchilla, on the other hand, did see a ton of fanfare.
Cool. I agree with you that conceptual work is bad in part because of a lack of good examples/grounding/feedback loops, though I think this can be overcome with clever toy problem design and analogies to current problems (that you can then get the examples/grounding/feedback loops from). E.g. surely we can test toy versions of shard theory claims using the small algorithmic neural networks we’re able to fully reverse engineer.