It might just be a perception problem. LLMs don’t really seem to have a good understanding of a letter being next to another one yet or what a diagonal is. If you look at arc-agi with o3, you see it doing worse as the grid gets larger with humans not having the same drawback.
EDIT: Tried on o1 pro right now. Doesn’t seem like a perception problem, but it still could be. I wonder if it’s related to being a succcesful agent. It might not model a sequence of actions on the state of a world properly yet. It’s strange that this isn’t unlocked with reasoning.
It might just be a perception problem. LLMs don’t really seem to have a good understanding of a letter being next to another one yet or what a diagonal is. If you look at arc-agi with o3, you see it doing worse as the grid gets larger with humans not having the same drawback.
EDIT: Tried on o1 pro right now. Doesn’t seem like a perception problem, but it still could be. I wonder if it’s related to being a succcesful agent. It might not model a sequence of actions on the state of a world properly yet. It’s strange that this isn’t unlocked with reasoning.