Carl Feynman answers Why no major LLMs with memory?

Carl Feynman 28 Mar 2023 17:14 UTC
28 points
9
Models with long-term memory are very hard to train. Instead of being able to compute a weight update after seeing a single input, you have to run in a long loop of ”put thing in memory, take thing out, compute with it, etc” before you can compute a weight update. It’s not a priori impossible, but nobody’s managed to get it to work. Evolution has figured out how to do it because it’s willing to waste an entire lifetime to get a single noisy update.
People have been working on this for years. It’s remarkable (in retrospect, to me) that we’ve gotten as far as we have without long term memory.
- jacopo 28 Mar 2023 19:42 UTC
  3 points
  0
  Parent
  Isn’t that the point of the original transformer paper? I have not actually read it, just going by summaries read here and there.
  
  If I don’t misremember RNN should be expecially difficult to train in parallel
  - Carl Feynman 28 Mar 2023 23:41 UTC
    5 points
    0
    Parent
    Transformers take O(n^2) computation for a context window of size n, because they effectively feed everything inside the context window to every layer. It provides the benefits of a small memory, but it doesn’t scale. It has no way of remembering things from before the context window, so it’s like a human with a busted hippocampus (Korsakoff’s syndrome) who can‘t make new memories.
- Noosphere89 28 Mar 2023 17:29 UTC
  3 points
  2
  Parent
  I suspect much of the reason we didn’t need much long term memory is that we can increase the context window pretty cheaply, thus long-term memory is deprioritized.