Misnaming and Other Issues with OpenAI's “Human Level” Superintelligence Hierarchy

Bloomberg reports that OpenAI internally has benchmarks for “Human-Level AI.” They have 5 levels, with the first being the achieved level of having intelligent conversation, to level 2, “[unassisted, PhD-level] Reasoners,” level 3, “Agents,” level 4, systems that can “come up with new innovations,” and finally level 5, “AI that can do the work of… Organizations.”

The levels, in brief, are:

1 - Conversation
2 - Reasoning
3 - Agent
4 - Innovation
5 - Organization

This is being reported secondhand, but given that, there seem to be some major issues with the ideas. Below, I outline two major issues I have with what is being reported.

...but this is Superintelligence

First, given the levels of capability being discussed, OpenAI’s typology is, at least at higher levels, explicitly discussing superintelligence, rather than “Human-Level AI.” To see this, I’ll use Bostrom’s admittedly imperfect definitions. He starts by defining superintelligence as “intellects that greatly outperform the best current human minds across many very general cognitive domains,” then breaks down several ways this could occur.

Starting off, his typology defines speed superintelligence as “an intellect that is just like a human mind but faster.” This would arguably include even their level 2, which “”its technology is approaching,” since “basic problem-solving tasks as well as a human with a doctorate-level education who doesn’t have access to any tools” runs far faster than humans already. But they are describing a system with already-superhuman recall and multi-domain expertise to humans, and inference using these systems is easily superhumanly fast.

Level 4, AI that can come up with innovations, presumably, those which humans have not, would potentially be a quality superintelligence, “at least as fast as a human mind and vastly qualitatively smarter,” though the qualification for “vastly” is very hard to quantify. However, level 5 is called “Organizations,” which presumably replaces entire organizations with multi-part AI-controlled systems, and would be what Bostrom calls “a system achieving superior performance by aggregating large numbers of smaller intelligences.”

However, it is possible that in their framework, OpenAI means something that is, perhaps definitionally, not superintelligence. That is, they will define these as systems only as capable as humans or human organizations, rather than far outstripping them. And this is where I think their levels are not just misnamed, but fundamentally confused—as presented, these are not levels, they are conceptually distinct possible applications, pathways, or outcomes.

Ordering Levels?

Second, as I just noted, the claim that these five distinct descriptions are “levels” and they can be used to track progress implies that OpenAI has not only a clear idea of what would be required for each different stage, but that they have a roadmap which shows that the levels would happen in the specified order. That seems very hard to believe, on both counts. I won’t go into why I think they don’t know what the path looks like, but I can at least explain why the order is dubious.

For instance, there are certainly human “agents” who are unable to perform tasks which we expect of what they call level two, i.e. that which an unassisted doctorate-level individual is able to do. Given that, what is the reason level 4 is after level 2? Similarly, the ability to coordinate and cooperate is not bound by the ability to function at a very high intellectual level; many organizations have no members which have PhDs, but still run grocery stores, taxi companies, or manufacturing plants.

And we’re already seeing work being done on agents that are intended to operate largely independently, performing several days of human work without specific supervision. At present, it seems these systems fail partly because of the limitations of the underlying systems, and partly because better structures for these systems are needed. However, at the very least, it’s unclear whether we’d see AI that can innovate effectively (level 4) before or after they are successful working independently (level 3).

So it seems that we have no idea whether GPT-5, whenever they decide to release it, will end up as a level-5-but-not-4 system (organization that cannot innovate,) or a level 3-but-not-2 (agent without a PhD) system, or a level 4-but-not-3 (innovator that cannot operate independently for multiple days) systems. Of course, it’s possible that all of these objections will be addressed in OpenAI’s full “progress tracking system”—but it seems far more likely that the levels they are talking about are more a marketing technique to sell the idea that their systems will be predictable in their abilities.

I’m deeply skeptical.

Misnaming and Other Issues with OpenAI’s “Human Level” Superintelligence Hierarchy

...but this is Superintelligence

Ordering Levels?