Mechanisms like attention only seem analogous to a human’s sensory memory. Reasoning models have something like a working memory but even then I think we’d need something in embedding space to constitute a real working memory analog. And having something like a short term memory could could help Claude avoid repeating the same mistakes.
This is, in some sense, very scary because when someone figures out how to train agent reasoning in embedded space there might be a very dramatic discontinuity in how well LLMs can act as agents.
Mechanisms like attention only seem analogous to a human’s sensory memory. Reasoning models have something like a working memory but even then I think we’d need something in embedding space to constitute a real working memory analog. And having something like a short term memory could could help Claude avoid repeating the same mistakes.
This is, in some sense, very scary because when someone figures out how to train agent reasoning in embedded space there might be a very dramatic discontinuity in how well LLMs can act as agents.
Maybe, but on reasonable interpretations I think this should cause us to expect AGI to be farther not nearer.