I’m wary of the assumption that we can judge “human ability” on a novel task X by observing performance after an hour of practice.
There are some tasks where performance improves with practice but plateaus within one hour. I’m thinking of relatively easy video games. Or relatively easy games in general, like casual card/board/party games with simple rules and optimal policies. But most interesting things that humans “can do” take much longer to learn than this.
Here are some things that humans “can do,” but require >> 1 hour of practice to “do,” while still requiring far less exposure to task-specific example data than we’re used to in ML:
Superforecasting
Reporting calibrated numeric credences, a prerequisite for both superforecasting and the GPT game (does this take >> 1 hour? I would guess so, but I’m not sure)
Playing video/board/card games of nontrivial difficulty or depth
Speaking any given language, even when learned during the critical language acquisition period
Driving motor vehicles like cars (arguably) and planes (definitely)
Writing good prose, for any conventional sense of “good” in any genre/style
Juggling
Computer programming (with any proficiency, and certainly e.g. competitive programming)
Doing homework-style problems in math or physics
Acquiring and applying significant factual knowledge in academic subjects like law or history
The last 3 examples are the same ones Owain_Evans mentioned in another thread, as examples of things LMs can do “pretty well on.”
If we only let the humans practice for an hour, we’ll conclude that humans “cannot do” these tasks at the level of current LMs either, which seems clearly wrong (that is, inconsistent with the common-sense reading of terms like “human performance”).
Ok, sounds like you’re using “not too much data/time” in a different sense than I was thinking of; I suspect we don’t disagree. My current guess is that some humans could beat GPT-1 with ten hours of practice, but that GPT-2 or larger would be extremely difficult or and plausibly impossible with any amount of practice.
The human brain internally is performing very similar computations to transformer LLMs—as expected from all the prior research indicating strong similarity between DL vision features and primate vision—but that doesn’t mean we can immediately extract those outputs and apply them towards game performance.
I’m wary of the assumption that we can judge “human ability” on a novel task X by observing performance after an hour of practice.
There are some tasks where performance improves with practice but plateaus within one hour. I’m thinking of relatively easy video games. Or relatively easy games in general, like casual card/board/party games with simple rules and optimal policies. But most interesting things that humans “can do” take much longer to learn than this.
Here are some things that humans “can do,” but require >> 1 hour of practice to “do,” while still requiring far less exposure to task-specific example data than we’re used to in ML:
Superforecasting
Reporting calibrated numeric credences, a prerequisite for both superforecasting and the GPT game (does this take >> 1 hour? I would guess so, but I’m not sure)
Playing video/board/card games of nontrivial difficulty or depth
Speaking any given language, even when learned during the critical language acquisition period
Driving motor vehicles like cars (arguably) and planes (definitely)
Writing good prose, for any conventional sense of “good” in any genre/style
Juggling
Computer programming (with any proficiency, and certainly e.g. competitive programming)
Doing homework-style problems in math or physics
Acquiring and applying significant factual knowledge in academic subjects like law or history
The last 3 examples are the same ones Owain_Evans mentioned in another thread, as examples of things LMs can do “pretty well on.”
If we only let the humans practice for an hour, we’ll conclude that humans “cannot do” these tasks at the level of current LMs either, which seems clearly wrong (that is, inconsistent with the common-sense reading of terms like “human performance”).
Ok, sounds like you’re using “not too much data/time” in a different sense than I was thinking of; I suspect we don’t disagree. My current guess is that some humans could beat GPT-1 with ten hours of practice, but that GPT-2 or larger would be extremely difficult or and plausibly impossible with any amount of practice.
The human brain internally is performing very similar computations to transformer LLMs—as expected from all the prior research indicating strong similarity between DL vision features and primate vision—but that doesn’t mean we can immediately extract those outputs and apply them towards game performance.