jacopo comments on Why no major LLMs with memory?

jacopo 28 Mar 2023 19:42 UTC
3 points
0
Isn’t that the point of the original transformer paper? I have not actually read it, just going by summaries read here and there.

If I don’t misremember RNN should be expecially difficult to train in parallel
- Carl Feynman 28 Mar 2023 23:41 UTC
  5 points
  0
  Parent
  Transformers take O(n^2) computation for a context window of size n, because they effectively feed everything inside the context window to every layer. It provides the benefits of a small memory, but it doesn’t scale. It has no way of remembering things from before the context window, so it’s like a human with a busted hippocampus (Korsakoff’s syndrome) who can‘t make new memories.