Measuring artificial intelligence on human benchmarks is naive

Link post

Central claim: Measured objectively, GPT-4 is arguably way past human intelligence already, perhaps even after taking generality into account.

Central implication: If the reason we’re worried AGI will wipe us out is tied to an objective notion of intelligence—such as the idea that it starts to reflect on its values or learn planning just as it crosses a threshold for cognitive power around human level—we should already update on the fact that we’re still alive.

I don’t yet have a principled way of measuring “generality”,[1] so my intuition just tends to imagine it as “competence at a wide range of tasks in the mammal domain.” This strikes me as comparable to the anthropomorphic notion of intelligence people had back when they thought birds were dumb.

When GPT-2 was introduced, it had already achieved superhuman performance on next-token prediction. We could only hope to out-predict it on a limited set of tokens extremely prefiltered for precisely what we care the most about. For instance, when a human reads a sentence like...

“It was a rainy day in Nairobi, the capital of _”

...it’s obvious to us (for cultural reasons!) that the next word is an exceptionally salient piece of knowledge. So those are the things we base our AI benchmarks on. However, GPT cares equally about predicting ‘capital’ after ‘the’, and ‘rainy’ after ‘It was a’. Its loss function does not discriminate.[2]

Consider in combination that a) GPT-4 has a non-discriminating loss function, and b) it rivals us even at the subset of tasks we optimise the hardest. What does this imply?

It’s akin to a science fiction author whose only objective is to write better stories yet ends up rivalling top scientists in every field as an instrumental side quest.

Make no mistake, next-token prediction is an immensely rich domain, and the sub-problems could be more complex than we know. Human-centric benchmarks vastly underestimate both the objective intelligence and generality of GPTs, unless I’m just confused.

  1. ^

    Incidentally, please share if you know any good definitions of generality e.g. from information theory or something.

  2. ^

    At least during pretraining afaik.