I, also, am skeptical. The weird spikiness of the abilities of LLMs thows off our ability to place them at a skill level which makes sense from a human perspective. They have some reasoning ability, but is negatively impacted by incorrect pattern-matching to their hige reservoir of memorized patterns. So, depending on context they might reason at the level of a six year old or a 14 year old, whilst having more PhD-level facts memorized than any human ever has. Weird. How do we rank such a thing against a human skill chart? It does not follow human development progressions. As soon as it is an agent with long horizon execution ability, it will necessarily be a superhuman one because it already has superhuman skills in speed and factual knowledge recall.
Levels 3 and 5 thus seem closely linked to me. I would be surprised to see one without the other.
Similarly levels 2 and 4 seem fairly closely linked, although less so than 3 and 5. But still, if I saw one without the other I would expect the other to follow very soon. So, as David says, the ordering makes little sense.
I, also, am skeptical. The weird spikiness of the abilities of LLMs thows off our ability to place them at a skill level which makes sense from a human perspective. They have some reasoning ability, but is negatively impacted by incorrect pattern-matching to their hige reservoir of memorized patterns. So, depending on context they might reason at the level of a six year old or a 14 year old, whilst having more PhD-level facts memorized than any human ever has. Weird. How do we rank such a thing against a human skill chart? It does not follow human development progressions. As soon as it is an agent with long horizon execution ability, it will necessarily be a superhuman one because it already has superhuman skills in speed and factual knowledge recall. Levels 3 and 5 thus seem closely linked to me. I would be surprised to see one without the other. Similarly levels 2 and 4 seem fairly closely linked, although less so than 3 and 5. But still, if I saw one without the other I would expect the other to follow very soon. So, as David says, the ordering makes little sense.