However, he was specifically referring to a clever innovation which occurs several years in the future. If the DeepMind’s 10x more efficient claim holds up, is that a bigger jump than Paul Christiano predicted would be plausible today?
I’d also be interested in hearing Paul/Eliezer’s takes on RETRO, though I don’t think they reported compute scaling curves in their paper?
Headline quote:
With a 2 trillion token database, our Retrieval-Enhanced Transformer (Retro) obtains comparable performance to GPT-3 and Jurassic-1 on the Pile, despite using 25× fewer parameters.
Relevant figures for performance are figure 1 and figure 3
I’d also be interested in hearing Paul/Eliezer’s takes on RETRO, though I don’t think they reported compute scaling curves in their paper?
Headline quote:
Relevant figures for performance are figure 1 and figure 3