gwern comments on LLMs, Batches, and Emergent Episodic Memory

gwern 3 Jul 2023 0:33 UTC
2 points
0
Depends on what you want to do. Look at “dynamic evaluation” (bibliography) for something with a learning rate which is not using an external memory like neural cache etc.
- Lao Mein 3 Jul 2023 5:23 UTC
  2 points
  0
  Parent
  I’m mostly just curious about how difficult it is for a transformer to learn to effectively access information from recent backprops, without using outside structures. Can it pull an essay title? General topic? And how well does this work for stochastic vs. batch processing? Thanks a lot btw.