Firstly, Theory of Mind is an important component of normal human thought, but consider that it requires lengthy social experience training data to form, and there are thus historical examples of humans who also probably lack normal human-level theory of mind: feral children.
Secondly and more importantly, the examples you cite are not actually strong direct evidence that powerful LLMs lack theory of mind—they are more evidence that they lack a ‘folk physics’ world model and associated mental scratchpads—ie what Marcello calls “Board Vision”. The LLM has a hard time instantiating and visualizing a scene such that it can understand visual line of sight, translucency, etc. If you test LLMs on TOM problems that involve pure verbal reasoning among the agents I believe they score closer to human level. TOM requires simulating other agent minds at some level, and thus requires specific mental machinery depending on the task.
You make interesting points. What about the other examples of the ToM task (the agent writing the false label themselves, or having been told by a trusted friend what is actually in the bag)?
There are two issues here:
Firstly, Theory of Mind is an important component of normal human thought, but consider that it requires lengthy social experience training data to form, and there are thus historical examples of humans who also probably lack normal human-level theory of mind: feral children.
Secondly and more importantly, the examples you cite are not actually strong direct evidence that powerful LLMs lack theory of mind—they are more evidence that they lack a ‘folk physics’ world model and associated mental scratchpads—ie what Marcello calls “Board Vision”. The LLM has a hard time instantiating and visualizing a scene such that it can understand visual line of sight, translucency, etc. If you test LLMs on TOM problems that involve pure verbal reasoning among the agents I believe they score closer to human level. TOM requires simulating other agent minds at some level, and thus requires specific mental machinery depending on the task.
You make interesting points. What about the other examples of the ToM task (the agent writing the false label themselves, or having been told by a trusted friend what is actually in the bag)?