It seems that o4-mini-high (released today) is able to solve the first problem with one attempt, though it needs some prompting to explain its solution. It first asserts that the minimal number of moves is 15. If you ask it to list the moves, it is able to do so, and the list of moves seems valid on my check. If asked to prove that 15 is minimal, it reports that a BFS shows that 15 is minimal.
I’m not sure if this fully counts as a success, as I suspect it wrote code to perform the BFS while generating the answer. It was also unable to point out that, given a vaild 15 move sequence, it MUST be minimal as the sum of the taxicab distances of the initial and final positions is 15. I’ve included the chat link below.
It seems that o4-mini-high (released today) is able to solve the first problem with one attempt, though it needs some prompting to explain its solution. It first asserts that the minimal number of moves is 15. If you ask it to list the moves, it is able to do so, and the list of moves seems valid on my check. If asked to prove that 15 is minimal, it reports that a BFS shows that 15 is minimal.
I’m not sure if this fully counts as a success, as I suspect it wrote code to perform the BFS while generating the answer. It was also unable to point out that, given a vaild 15 move sequence, it MUST be minimal as the sum of the taxicab distances of the initial and final positions is 15. I’ve included the chat link below.
https://chatgpt.com/share/68004414-78f8-8004-8f02-de904d969489
I’d say that anything that gives the right result counts as a success.