First: You seem to be using two rather different notions of “IQ”. In one paragraph, you give the modest criterion that human IQ satisfies:
By evaluating someone’s learning skills on a couple of fields at random, you can get a picture of how hard it will be for that person to learn skills from other fields. That picture will be incomplete, but it will give you some non-trivial information.
But then in the next section, you seem to be saying that if AI IQ were a coherent concept, then it must follow much more stringent criteria than the above. No. The concept of AI IQ would be “as models get better at task X, this will correlate with them getting better at many other tasks”; not every single other task, and you can’t necessarily predict very much about specific tasks—and nor does it make sense to assume that everything that’s true about “if humans can do X then they can do Y” straightforwardly applies to AIs. But rather, as you say, if you know the AI is good at task X, then it will give you some non-trivial information about its performance at other tasks. And I believe this is in fact generally true; didn’t GPT-4 demonstrate this (an improvement at many different tasks) in spades, and that was what got a lot of people worried about it?
As researchers moved from GPT-2 to GPT-3, they were discovering abilities, often being surprised by them as the model grew. This is not what it looks like to have identified a g-factor!
Er… they found that a bunch of new abilities showed up together while they were working on improving other stuff. That sounds like evidence for a g-factor: a correlation between abilities. Whether a g-factor exists is independent of whether the humans have figured out how to measure it and what its implications are.
Second: The “we’ll have some warning shots before it’s too late” hypothesis essentially depends on two propositions: (a) the AIs will develop the capability to do dangerous-but-not-world-dominating things before they develop the capability to dominate the world, and (b) that we’ll detect the first kind of capability (possibly by it being executed) before the second capability is executed. (b) is down to detection efforts and luck. (a) is essentially a set of propositions of the form “It’s harder to teach an AI to hack into everything on the internet and train bigger versions of itself while concealing its operations and parlay that into world domination, than to teach an AI to find a few zero-day RCE exploits in commonly used software that some malicious humans use for a large ransomware or sabotage program”. You could kind of say that AI IQ is related to (a), but...
Actually, the stronger you believe the AI IQ concept is (i.e. the more correlated you think the skills will be), the more you would likely believe that dangerous skill X would correlate with dangerous skills Y and Z, which together form world domination kit W; which would make it less likely that you’d observe the dangerous skills in isolation before the AI killed us. So the more you believe in an AI g-factor, the less confident you should be in the “there will be warning shots” approach. Yet you say there is no AI g-factor, and that those who believe in the AI g-factor are overconfident in warning shots?
Your post seems to make the most sense if I assume that you think “AI IQ” means “the belief that everything we know about human IQ levels can be directly applied to AIs”, and that by “there is no AI g-factor” you mean “humans haven’t figured out how to measure the AI g-factor and what its implications are”. But then I think you’re beating up strawmen.
First: You seem to be using two rather different notions of “IQ”. In one paragraph, you give the modest criterion that human IQ satisfies:
But then in the next section, you seem to be saying that if AI IQ were a coherent concept, then it must follow much more stringent criteria than the above. No. The concept of AI IQ would be “as models get better at task X, this will correlate with them getting better at many other tasks”; not every single other task, and you can’t necessarily predict very much about specific tasks—and nor does it make sense to assume that everything that’s true about “if humans can do X then they can do Y” straightforwardly applies to AIs. But rather, as you say, if you know the AI is good at task X, then it will give you some non-trivial information about its performance at other tasks. And I believe this is in fact generally true; didn’t GPT-4 demonstrate this (an improvement at many different tasks) in spades, and that was what got a lot of people worried about it?
Er… they found that a bunch of new abilities showed up together while they were working on improving other stuff. That sounds like evidence for a g-factor: a correlation between abilities. Whether a g-factor exists is independent of whether the humans have figured out how to measure it and what its implications are.
Second: The “we’ll have some warning shots before it’s too late” hypothesis essentially depends on two propositions: (a) the AIs will develop the capability to do dangerous-but-not-world-dominating things before they develop the capability to dominate the world, and (b) that we’ll detect the first kind of capability (possibly by it being executed) before the second capability is executed. (b) is down to detection efforts and luck. (a) is essentially a set of propositions of the form “It’s harder to teach an AI to hack into everything on the internet and train bigger versions of itself while concealing its operations and parlay that into world domination, than to teach an AI to find a few zero-day RCE exploits in commonly used software that some malicious humans use for a large ransomware or sabotage program”. You could kind of say that AI IQ is related to (a), but...
Actually, the stronger you believe the AI IQ concept is (i.e. the more correlated you think the skills will be), the more you would likely believe that dangerous skill X would correlate with dangerous skills Y and Z, which together form world domination kit W; which would make it less likely that you’d observe the dangerous skills in isolation before the AI killed us. So the more you believe in an AI g-factor, the less confident you should be in the “there will be warning shots” approach. Yet you say there is no AI g-factor, and that those who believe in the AI g-factor are overconfident in warning shots?
Your post seems to make the most sense if I assume that you think “AI IQ” means “the belief that everything we know about human IQ levels can be directly applied to AIs”, and that by “there is no AI g-factor” you mean “humans haven’t figured out how to measure the AI g-factor and what its implications are”. But then I think you’re beating up strawmen.