gwern comments on Superbabies: Putting The Pieces Together

gwern 19 Jul 2024 21:41 UTC
21 points
6
IQ tests include sub-tests which can be cardinal, with absolute variables. For example, simple & complex reaction time; forwards & backwards digit span; and vocabulary size. (You could also consider tests of factual knowledge.) It would be entirely possible to ask, ‘given that reaction time follows a log-normalish distribution in milliseconds and loads on g by r = 0.X and assuming invariance, what would be the predicted lower reaction time of someone Y SDs higher than the mean on g?’ Or ‘given that backwards digit span is normally distributed...’ This is as concrete and meaningful as grams of protein in maize. (There are others, like naming synonyms or telling different stories or inventing different uses of an object etc, where there is a clear count you could use, beyond just relative comparisons of ‘A got an item right and B got an item wrong’.)

Psychometrics has many ways to make tests harder or deal with ceilings. You could speed them up, for example, and allot someone 30 seconds to solve a problem that takes a very smart person 30 minutes. Or you could set a problem so hard that no one can reliably solve it, and see how many attempts it takes to get it right (the more wrong guesses you make and are corrected on, the worse). Or you could make problems more difficult by removing information from it, and see how many hints it takes. (Similar to handicapping in Go.) Or you could remove tools and references, like going from an open-book test to a closed-book test. For some tests, like Raven matrices, you can define a generating process to create new problems by combining a set of rules, so you have a natural objective level of difficulty there. There was a long time ago an attempt to create an ‘objective IQ test’ usable for any AI system by testing them on predicting small randomly-sampled Turing machines—it never got anywhere AFAIK, but I still think this is a viable idea.

(And you increasingly see all of these approaches being taken to try to create benchmarks that can meaningfully measure LLM capabilities for just the next year or two...)
- Linch 19 Jul 2024 21:48 UTC
  3 points
  1
  Parent
  I think these are good ideas. I still agree with Erick’s core objection that once you’re outside of “normal” human range + some buffer, IQ as classically understood is no longer a directly meaningful concept so we’ll have to redefine it somehow, and there are a lot of free parameters for how to define it (eg somebody’s 250 can be another person’s 600).