Of course much of that is basic capability issues -poor spatial reasoning, short term memory that doesn’t come anywhere close to lasting for 1 lap, etc.
But I’ve also noticed ways in which Claude’s personality is sabotaging it. Claude is capable of taking notes saying that it “THOROUGHLY confirmed NO passages” through the eastern barrier—but never gets impatient or frustrated, so this doesn’t actually prevent it from trying the same thing every time it sees the eastern wall again.
And it general, it seems to have a strong bias towards visiting places that are mentioned frequently in its notes—even though that’s the exact opposite of what you should be doing for exploration. I’ve seen it reach the uncommonly reached second ladder on the floor, and then promptly decided it needs to run back to the first ladder (which it has seen hundreds of times) to see whether the first ladder goes anywhere.
And it should definitely be mentioned that run #1 was mercy killed when its knowledge base was populated almost entirely with falsehoods both about how far it had progressed in the game and how to get further, leading to a singleminded obsession with exploring the southern wall of Cerulean City forever.
And now in the second run it has entered a similar delusional loop. It knows the way to Cerulean City is via Route 4, but the route before and after Mt. Moon are both considered part of Route 4. Therefore it deluded itself into thinking it can get to Cerulean from the first part of the route. Because of that, every time it accidentally stumbles into Mt Moon and is making substantial progress towards the exit, it intentionally blacks out to get teleported back outside the entrance, so it can look for the nonexistent path forwards.
From what I’ve seen on stream, the chances of it questioning and breaking from this delusion are basically zero. There’s still the possibility of progress by getting lost in Mt Moon and stumbling into the exit, but it will never actually figure out what it was doing wrong here.
Also, the stream admin seemed to think the same thing, saying during the first run that “some runs just are cursed” and setting up a poll for whether to reset the game.
Claude finally made it to Cerulean after the “Critique Claude” component correctly identified that it was stuck in a loop, and decided to go through Mt. Moon. (I think Critique Claude is prompted specifically to stop loops.)
This basically sums up how it’s doing: https://www.reddit.com/r/ClaudePlaysPokemon/comments/1j568ck/the_mount_moon_experience
Of course much of that is basic capability issues -poor spatial reasoning, short term memory that doesn’t come anywhere close to lasting for 1 lap, etc.
But I’ve also noticed ways in which Claude’s personality is sabotaging it. Claude is capable of taking notes saying that it “THOROUGHLY confirmed NO passages” through the eastern barrier—but never gets impatient or frustrated, so this doesn’t actually prevent it from trying the same thing every time it sees the eastern wall again.
And it general, it seems to have a strong bias towards visiting places that are mentioned frequently in its notes—even though that’s the exact opposite of what you should be doing for exploration. I’ve seen it reach the uncommonly reached second ladder on the floor, and then promptly decided it needs to run back to the first ladder (which it has seen hundreds of times) to see whether the first ladder goes anywhere.
And it should definitely be mentioned that run #1 was mercy killed when its knowledge base was populated almost entirely with falsehoods both about how far it had progressed in the game and how to get further, leading to a singleminded obsession with exploring the southern wall of Cerulean City forever.
And now in the second run it has entered a similar delusional loop. It knows the way to Cerulean City is via Route 4, but the route before and after Mt. Moon are both considered part of Route 4. Therefore it deluded itself into thinking it can get to Cerulean from the first part of the route. Because of that, every time it accidentally stumbles into Mt Moon and is making substantial progress towards the exit, it intentionally blacks out to get teleported back outside the entrance, so it can look for the nonexistent path forwards.
From what I’ve seen on stream, the chances of it questioning and breaking from this delusion are basically zero. There’s still the possibility of progress by getting lost in Mt Moon and stumbling into the exit, but it will never actually figure out what it was doing wrong here.
People in the stream chat and subreddit have been discussing this paper suggesting that LLM agents often get into these “meltdown” loops that they aren’t able to recover from: https://www.reddit.com/r/ClaudePlaysPokemon/comments/1j65jqf/vendingbench_a_benchmark_for_longterm_coherence
Also, the stream admin seemed to think the same thing, saying during the first run that “some runs just are cursed” and setting up a poll for whether to reset the game.
Update: Claude made it to Cerulean City today, after wandering the Mt. Moon area for 69 hours.
Claude finally made it to Cerulean after the “Critique Claude” component correctly identified that it was stuck in a loop, and decided to go through Mt. Moon. (I think Critique Claude is prompted specifically to stop loops.)