Carl Feynman comments on Why no major LLMs with memory?

Carl Feynman 28 Mar 2023 23:41 UTC
5 points
0
Transformers take O(n^2) computation for a context window of size n, because they effectively feed everything inside the context window to every layer. It provides the benefits of a small memory, but it doesn’t scale. It has no way of remembering things from before the context window, so it’s like a human with a busted hippocampus (Korsakoff’s syndrome) who can‘t make new memories.