And now in the second run it has entered a similar delusional loop. It knows the way to Cerulean City is via Route 4, but the route before and after Mt. Moon are both considered part of Route 4. Therefore it deluded itself into thinking it can get to Cerulean from the first part of the route. Because of that, every time it accidentally stumbles into Mt Moon and is making substantial progress towards the exit, it intentionally blacks out to get teleported back outside the entrance, so it can look for the nonexistent path forwards.
From what I’ve seen on stream, the chances of it questioning and breaking from this delusion are basically zero. There’s still the possibility of progress by getting lost in Mt Moon and stumbling into the exit, but it will never actually figure out what it was doing wrong here.
Also, the stream admin seemed to think the same thing, saying during the first run that “some runs just are cursed” and setting up a poll for whether to reset the game.
Claude finally made it to Cerulean after the “Critique Claude” component correctly identified that it was stuck in a loop, and decided to go through Mt. Moon. (I think Critique Claude is prompted specifically to stop loops.)
And now in the second run it has entered a similar delusional loop. It knows the way to Cerulean City is via Route 4, but the route before and after Mt. Moon are both considered part of Route 4. Therefore it deluded itself into thinking it can get to Cerulean from the first part of the route. Because of that, every time it accidentally stumbles into Mt Moon and is making substantial progress towards the exit, it intentionally blacks out to get teleported back outside the entrance, so it can look for the nonexistent path forwards.
From what I’ve seen on stream, the chances of it questioning and breaking from this delusion are basically zero. There’s still the possibility of progress by getting lost in Mt Moon and stumbling into the exit, but it will never actually figure out what it was doing wrong here.
People in the stream chat and subreddit have been discussing this paper suggesting that LLM agents often get into these “meltdown” loops that they aren’t able to recover from: https://www.reddit.com/r/ClaudePlaysPokemon/comments/1j65jqf/vendingbench_a_benchmark_for_longterm_coherence
Also, the stream admin seemed to think the same thing, saying during the first run that “some runs just are cursed” and setting up a poll for whether to reset the game.
Update: Claude made it to Cerulean City today, after wandering the Mt. Moon area for 69 hours.
Claude finally made it to Cerulean after the “Critique Claude” component correctly identified that it was stuck in a loop, and decided to go through Mt. Moon. (I think Critique Claude is prompted specifically to stop loops.)