Arguably this is what happened with LSTMs?
Is there a reference for this?
https://www.gwern.net/images/ai/gpt/2020-kaplan-figure7-rnnsvstransformers.png
What Gwern said. :) But I don’t know for sure what the person I talked to had in mind.
Is there a reference for this?
https://www.gwern.net/images/ai/gpt/2020-kaplan-figure7-rnnsvstransformers.png
What Gwern said. :) But I don’t know for sure what the person I talked to had in mind.