I don’t think I understand the “AI IQ” argument (or I disagree somewhere).
What goes wrong if we aim to build a series of high-quality benchmarks to assess how good AIs are at various tasks and then use these to track AI progress? (Tasks such as coding, small research projects, persuasion and manipulation, bioweapons, the ability to bypass safety countermeasures.)
Here’s my guess at the list of things that could go wrong (roughly speaking):
We don’t actually create the benchmarks (or the quality and diversity of benchmarks is inadequate).
There are no good thresholds that occur before AIs are capable of taking over that haven’t already been surpassed.
Capabilities (including fine-tuning and scaffolding) will be highly discontinuous, so all important thresholds will be crossed simultaneously.
AIs will sandbag on these benchmarks undetectably, and thus we won’t know the true capabilities of these AIs.
People won’t elicit AI capabilities on these benchmarks to a sufficiently competitive extent (both in competition with the AI itself and with people doing further elicitation work like fine-tuning, scaffolding, and prompting).
We’ll see lines being crossed, but people will just move the goalposts.
These all seem like either surmountable problems or reasonably unlikely to me.
(COI: I’m working on a research agenda that heavily relies on capabilities evaluation, and I’m friendly with many people at ARC evals and other orgs who are more optimistic about capabilities evaluations than this post seems to suggest.)
I don’t think I understand the “AI IQ” argument (or I disagree somewhere).
What goes wrong if we aim to build a series of high-quality benchmarks to assess how good AIs are at various tasks and then use these to track AI progress? (Tasks such as coding, small research projects, persuasion and manipulation, bioweapons, the ability to bypass safety countermeasures.)
Here’s my guess at the list of things that could go wrong (roughly speaking):
We don’t actually create the benchmarks (or the quality and diversity of benchmarks is inadequate).
There are no good thresholds that occur before AIs are capable of taking over that haven’t already been surpassed.
Capabilities (including fine-tuning and scaffolding) will be highly discontinuous, so all important thresholds will be crossed simultaneously.
AIs will sandbag on these benchmarks undetectably, and thus we won’t know the true capabilities of these AIs.
People won’t elicit AI capabilities on these benchmarks to a sufficiently competitive extent (both in competition with the AI itself and with people doing further elicitation work like fine-tuning, scaffolding, and prompting).
We’ll see lines being crossed, but people will just move the goalposts.
These all seem like either surmountable problems or reasonably unlikely to me.
(COI: I’m working on a research agenda that heavily relies on capabilities evaluation, and I’m friendly with many people at ARC evals and other orgs who are more optimistic about capabilities evaluations than this post seems to suggest.)