Martín Soto comments on So how well is Claude playing Pokémon?

Martín Soto 8 Mar 2025 14:42 UTC
2 points
0
It’s unclear what the optimal amount of thinking per step is. My initial guess would have been that letting Claude think for a whole paragraph before each single action (rather than only each 10 actions, or whenever it’s in a match, or whatever) scores slightly better than letting it think more (sequentially). But I guess this might work better if it’s what the streamer is using after some iteration.
The story for parallel checks could be different though. My guess would be going all out and letting Claude generate the paragraph 5 times and then generate 5 more parallel paragraphs about whether it has gotten something wrong, and then having a lower-context version of Claude decide whether there are any important disagreements, and if not just majority-vote, would improve robustness problems (like “I close a goal before actually achieving it”). But maybe this adds too much bloat and opportunities for mistakes, or makes some mistakes better but others way worse.