My interpretation is pretty similar, though perhaps unimportantly broader and more task-based. Something like performing between 10th and 100th percentile of human capability at cognitive tasks in the context of discussion. A calculator certainly doesn’t qualify, in both directions. It performs worse than human in almost every cognitive task and in contexts that refer only to certain very narrow arithmetic tasks, it performs super-humanly.
In a narrow Diplomacy-playing context, the recent bots are definitely human-level. Better than many humans (including the bottom 10% of people who have ever played the game), but not better than the best. Good chess programs are superhuman at their narrow domain, but utterly subhuman at everything else.
State of the art LLMs broadly display pretty much human level intelligence within their context, but with certain strengths and weaknesses somewhat outside the human range.
This multidimensionality is exactly why I think the term “human-level intelligence” should not be used. My impression is that it suggests a one-dimensional type of ability, with a threshold where the quality changes drastically; and the term even seems to suggest that this threshold to be at a level that is in fact not decisive.
Yes, that’s fair enough. It’s not like we have any examples of systems that have human-level intelligence in a broad context for the term to apply to anyway.
I do still think it’s a useful term for hypothetical discussions, referring to systems that are not obviously subhuman nor superhuman in broad capabilities. It is possible that such systems may never exist. If we develop superintelligence, it may be via systems that are always obviously subhuman in some respects and superhuman in others, or with a discontinuity in capability, or other even stranger possibilities.
My interpretation is pretty similar, though perhaps unimportantly broader and more task-based. Something like performing between 10th and 100th percentile of human capability at cognitive tasks in the context of discussion. A calculator certainly doesn’t qualify, in both directions. It performs worse than human in almost every cognitive task and in contexts that refer only to certain very narrow arithmetic tasks, it performs super-humanly.
In a narrow Diplomacy-playing context, the recent bots are definitely human-level. Better than many humans (including the bottom 10% of people who have ever played the game), but not better than the best. Good chess programs are superhuman at their narrow domain, but utterly subhuman at everything else.
State of the art LLMs broadly display pretty much human level intelligence within their context, but with certain strengths and weaknesses somewhat outside the human range.
This multidimensionality is exactly why I think the term “human-level intelligence” should not be used. My impression is that it suggests a one-dimensional type of ability, with a threshold where the quality changes drastically; and the term even seems to suggest that this threshold to be at a level that is in fact not decisive.
Yes, that’s fair enough. It’s not like we have any examples of systems that have human-level intelligence in a broad context for the term to apply to anyway.
I do still think it’s a useful term for hypothetical discussions, referring to systems that are not obviously subhuman nor superhuman in broad capabilities. It is possible that such systems may never exist. If we develop superintelligence, it may be via systems that are always obviously subhuman in some respects and superhuman in others, or with a discontinuity in capability, or other even stranger possibilities.