LLMs trying to complete long-term tasks are state machines where the context is their state. They have terrible tools to edit that state, at the moment. There is no location in memory that can automatically receive more attention, because the important memories move has the chain of thought does. Thinking off on a tangent throws a lot of garbage into the LLMs working memory. To remember an important fact over time the LLM needs to keep repeating it. And there isn’t enough space in the working memory for long-term tasks.
All of this is exemplified really clearly by Claude playing pokémon. Presumably similar lessons can be learned by watching LLMs try to complete other long horizon tasks.
Since long horizon tasks, AKA agents, are at the center of what AI companies are trying to get their models to do at the moment, they need to fix these problems. So I expect AI companies to give their models static memory and long-term tasks and let them reinforcement learn how to use their memory effectively.
Then it is not being used or not being used well as part of Claude plays pokémon. If Claude was taught to optimize it’s context as part of thinking, planning and acting it would play much better.
By static memory I meant a mental workspace that is always part of the context but that is only be edited intentionally, as opposed to the ever changing stream of consciousness that dominates the contexts today. Claude plays pokémon was given something like this and uses it really poorly.