Humans can remember a 10-digit phone number in working memory – AIs will be able to hold the entirety of Wikipedia in working memory
In the context of LLM working memory is not it’s training dataset. Training dataset in condensed and pattern-ized form is long term memory. Working memory is its “context window”, so 8k or 32k tokens right now. Which on one hand is much better than 10digit number, but on the other—this comparison grossly underestimates the amount of data person holds in their “working memory” without thinking too much about it. “Where am I, what am I doing, why am I doing this, who passed me right now, who is sitting behind me, what is the tools I have available at the moment...” None of this we put in actual words inside our head but we still hold all of them in our working memory.
In the context of LLM working memory is not it’s training dataset. Training dataset in condensed and pattern-ized form is long term memory. Working memory is its “context window”, so 8k or 32k tokens right now. Which on one hand is much better than 10digit number, but on the other—this comparison grossly underestimates the amount of data person holds in their “working memory” without thinking too much about it. “Where am I, what am I doing, why am I doing this, who passed me right now, who is sitting behind me, what is the tools I have available at the moment...” None of this we put in actual words inside our head but we still hold all of them in our working memory.