jeff8765 comments on Surprising LLM reasoning failures make me think we still need qualitative breakthroughs for AGI

jeff8765 17 Apr 2025 0:02 UTC
16 points
0
It seems that o4-mini-high (released today) is able to solve the first problem with one attempt, though it needs some prompting to explain its solution. It first asserts that the minimal number of moves is 15. If you ask it to list the moves, it is able to do so, and the list of moves seems valid on my check. If asked to prove that 15 is minimal, it reports that a BFS shows that 15 is minimal.
I’m not sure if this fully counts as a success, as I suspect it wrote code to perform the BFS while generating the answer. It was also unable to point out that, given a vaild 15 move sequence, it MUST be minimal as the sum of the taxicab distances of the initial and final positions is 15. I’ve included the chat link below.
https://chatgpt.com/share/68004414-78f8-8004-8f02-de904d969489
- Kaj_Sotala 17 Apr 2025 4:27 UTC
  5 points
  5
  Parent
  I’m not sure if this fully counts as a success, as I suspect it wrote code to perform the BFS while generating the answer.
  I’d say that anything that gives the right result counts as a success.