Claude has been playing pokemon for the last few days. It’s still playing, live on twitch. You can go watch alongside hundreds of other people. It’s fun.
What updates should I make about AGI timelines from Claude’s performance? Let’s think step by step.
First, it’s cool that Claude can do this at all. The game keeps track of “Step count” and Claude is over 30,000 already; I think that means 30,000 actions (e.g. pressing the A button). For each action there is about a paragraph of thinking tokens Claude produces, in order to decide what to do. Any way you slice it this is medium-horizon agency at least—claude is operating fully autonomously, in pursuit of goals, for a few days. Does this mean long-horizon agency is not so difficult to train after all?
Not so fast. Pokemon is probably an especially easy environment, and Claude is still making basic mistakes even so. In particular, Pokemon seems to have a relatively linear world where there’s a clear story/path to progress along, and moreover Claude’s pretraining probably teaches it the whole story + lots of tips & tricks for how to complete it. In D&D terms the story is running on rails.
I think I would have predicted in advance that this dimension of difficulty would matter, but also I feel validated by Claude’s performance—it seems that Claude is doing fine at Pokemon overall, except that Claude keeps getting stuck/lost wandering around in various places. It can’t seem to keep a good memory of what it’s already tried / where it’s already been, and so it keeps going in circles, until eventually it gets lucky and stumbles to the exit. A more challenging video game would be something open-ended and less-present-in-training-data like Dwarf Fortress.
On the other hand, maybe this is less a fundamental limitation Claude has and more a problem with its prompt/scaffold? Because it has a limited context window it has to regularly compress it by e.g. summarizing / writing ‘notes to self’ and then deleting the rest. I imagine there’s a lot of room for improvement in prompt engineering / scaffolding here, and then further low-hanging fruit in training Claude to make use of that scaffolding. And this might ~fully solve the going-in-circles problem. Still, even if so, I’d bet that Claude would perform much worse in a more open-ended game it didn’t have lots of background knowledge about.
So anyhow what does this mean for timelines? Well, I’ll look forward to seeing AIs getting better at playing Pokemon zero-shot (i.e. without having trained on it at all) over the course of the year. I think it’s a decent benchmark for long-horizon agency, not perfect but we don’t have perfect benchmarks yet. I feel like Claude’s current performance is not surprising enough to update me one way or another from my 2028 timelines. If the models at the end of 2025 (EDIT: I previously accidentally wrote “2028″ here) are not much better, that would probably make me want to push my median out to 2029 or 2030. (my mode would probably stay at 2027)
What would really impress me though (and update me towards shorter timelines) is multi-day autonomous operation in more open-ended environments, e.g. Dwarf Fortress. (DF is also just a much less forgiving game than Pokemon. It’s so easy to get wiped out. So it really means something if you are still alive after days of playtime.)
Or, of course, multi-day autonomous operation on real-world coding or research tasks. When that starts happening, I think we have about a year left till superintelligence, give or take a year.
I don’t know if this is helpful but as someone who was quite good at competitive Pokemon during their teenage years and also still keeps up with nuzlocking type things for fun, I would note that Pokemon’s game design is made to be a low context intensity RPG especially in early generations where the linearity is pushed to allow kids to do it.
If your point holds true on agency, I think the more important pinch points will be Lavender Town and Sabrina because those require backtracking through the storyline to get things.
I think mid-late game GSC would also be important to try because there are huge level gaps and transitions in the storyline that would make it hard to progress.
Note for posterity: “Let’s think step by step” is joke.
I downvoted this and I feel the urge to explain myself—the LLMism in the writing is uncanny.
The combination of “Let’s think step by step”, “First…” and “Not so fast…” gives me a subtle but dreadful impression that a highly valued member of the community is being finetuned by model output in real time. This emulation of the “Wait, but!” pattern is a bit too much for my comfort.
My comment hasn’t too much to do with the content but more about how unsettled I feel. I don’t think LLM outputs are all necessarily infohazardous—but I am beginning to see the potentially failure modes that people have been gesturing at for a while.
“Let’s think step by step” was indeed a joke/on purpose. Everything else was just my stream of consciousness… my “chain of thought” shall we say. I more or less wrote down thoughts as they came to me. Perhaps I’ve been influenced by reading LLM CoT’s, though I haven’t done very much of that. Or perhaps this is just what thinking looks like when you write it down?
I’ve spent enough time staring at LLM chain-of-thoughts now that when I started thinking about a thing for work, I found my thoughts taking the shape of an LLM thinking about how to approach its problem. And that actually felt like a useful systematic way of approaching the problem, so I started writing out that chain of thought like I was an LLM, and that felt valuable in helping me stay focused.
Of course, I had to amuse myself by starting the chain-of-thought with “The user has asked me to...”
Claude has been playing pokemon for the last few days. It’s still playing, live on twitch. You can go watch alongside hundreds of other people. It’s fun.
What updates should I make about AGI timelines from Claude’s performance? Let’s think step by step.
First, it’s cool that Claude can do this at all. The game keeps track of “Step count” and Claude is over 30,000 already; I think that means 30,000 actions (e.g. pressing the A button). For each action there is about a paragraph of thinking tokens Claude produces, in order to decide what to do. Any way you slice it this is medium-horizon agency at least—claude is operating fully autonomously, in pursuit of goals, for a few days. Does this mean long-horizon agency is not so difficult to train after all?
Not so fast. Pokemon is probably an especially easy environment, and Claude is still making basic mistakes even so. In particular, Pokemon seems to have a relatively linear world where there’s a clear story/path to progress along, and moreover Claude’s pretraining probably teaches it the whole story + lots of tips & tricks for how to complete it. In D&D terms the story is running on rails.
I think I would have predicted in advance that this dimension of difficulty would matter, but also I feel validated by Claude’s performance—it seems that Claude is doing fine at Pokemon overall, except that Claude keeps getting stuck/lost wandering around in various places. It can’t seem to keep a good memory of what it’s already tried / where it’s already been, and so it keeps going in circles, until eventually it gets lucky and stumbles to the exit. A more challenging video game would be something open-ended and less-present-in-training-data like Dwarf Fortress.
On the other hand, maybe this is less a fundamental limitation Claude has and more a problem with its prompt/scaffold? Because it has a limited context window it has to regularly compress it by e.g. summarizing / writing ‘notes to self’ and then deleting the rest. I imagine there’s a lot of room for improvement in prompt engineering / scaffolding here, and then further low-hanging fruit in training Claude to make use of that scaffolding. And this might ~fully solve the going-in-circles problem. Still, even if so, I’d bet that Claude would perform much worse in a more open-ended game it didn’t have lots of background knowledge about.
So anyhow what does this mean for timelines? Well, I’ll look forward to seeing AIs getting better at playing Pokemon zero-shot (i.e. without having trained on it at all) over the course of the year. I think it’s a decent benchmark for long-horizon agency, not perfect but we don’t have perfect benchmarks yet. I feel like Claude’s current performance is not surprising enough to update me one way or another from my 2028 timelines. If the models at the end of 2025 (EDIT: I previously accidentally wrote “2028″ here) are not much better, that would probably make me want to push my median out to 2029 or 2030. (my mode would probably stay at 2027)
What would really impress me though (and update me towards shorter timelines) is multi-day autonomous operation in more open-ended environments, e.g. Dwarf Fortress. (DF is also just a much less forgiving game than Pokemon. It’s so easy to get wiped out. So it really means something if you are still alive after days of playtime.)
Or, of course, multi-day autonomous operation on real-world coding or research tasks. When that starts happening, I think we have about a year left till superintelligence, give or take a year.
I don’t know if this is helpful but as someone who was quite good at competitive Pokemon during their teenage years and also still keeps up with nuzlocking type things for fun, I would note that Pokemon’s game design is made to be a low context intensity RPG especially in early generations where the linearity is pushed to allow kids to do it.
If your point holds true on agency, I think the more important pinch points will be Lavender Town and Sabrina because those require backtracking through the storyline to get things.
I think mid-late game GSC would also be important to try because there are huge level gaps and transitions in the storyline that would make it hard to progress.
Note for posterity: “Let’s think step by step” is joke.
I downvoted this and I feel the urge to explain myself—the LLMism in the writing is uncanny.
The combination of “Let’s think step by step”, “First…” and “Not so fast…” gives me a subtle but dreadful impression that a highly valued member of the community is being finetuned by model output in real time. This emulation of the “Wait, but!” pattern is a bit too much for my comfort.
My comment hasn’t too much to do with the content but more about how unsettled I feel. I don’t think LLM outputs are all necessarily infohazardous—but I am beginning to see the potentially failure modes that people have been gesturing at for a while.
I assume “let’s think step by step” is a joke/on purpose. The “first” and “not so fast” on their own don’t seem that egregious to me.
“Let’s think step by step” was indeed a joke/on purpose. Everything else was just my stream of consciousness… my “chain of thought” shall we say. I more or less wrote down thoughts as they came to me. Perhaps I’ve been influenced by reading LLM CoT’s, though I haven’t done very much of that. Or perhaps this is just what thinking looks like when you write it down?
I’ve spent enough time staring at LLM chain-of-thoughts now that when I started thinking about a thing for work, I found my thoughts taking the shape of an LLM thinking about how to approach its problem. And that actually felt like a useful systematic way of approaching the problem, so I started writing out that chain of thought like I was an LLM, and that felt valuable in helping me stay focused.
Of course, I had to amuse myself by starting the chain-of-thought with “The user has asked me to...”
Balrog eval has Nethack. I want to see an LLM try to beat that.
Runescape would be a good one