Maybe such an IQ test could be designed. --> Ah, but then specialized AIs (debatably “all of them” or “most of the non-LLM ones”) would just fail equally! --> Maybe that’s okay? Like, you don’t give a blind kid a typical SAT, you give them a braille SAT or a reader or something (or, often, no accommodation).
The capabilities-in-the-wrong-order point is spot-on.
I could also imagine a thing that’s not so similar to IQ in the details, but still measures some kind of generic “information processing/retention/[other activities] capability”… but, as noted, we already can’t predict capabilities, so such a measure would need to either solve that or else be less useful than e.g. parameter-count. --> If we classified and properly understood a taxonomy of capabilities, that’d help build such a measure! --> That task, itself, is probably really bloody difficult compared with “some hypothetical tactic that attacks the ‘IQ-style fire-alarm’ narrative head-on / in a different way”.
(A toy demo of “the IQ-style fire alarm won’t come” could be a good subtask of “toy demo of misalignment that actually convinces people”… OR it could end up as a “toy demo of capabilities that just pushes people to work more on capabilities”, which is obviously bad.)
Now I’m immediately going to this thought chain:
Maybe such an IQ test could be designed. --> Ah, but then specialized AIs (debatably “all of them” or “most of the non-LLM ones”) would just fail equally! --> Maybe that’s okay? Like, you don’t give a blind kid a typical SAT, you give them a braille SAT or a reader or something (or, often, no accommodation).
The capabilities-in-the-wrong-order point is spot-on.
I could also imagine a thing that’s not so similar to IQ in the details, but still measures some kind of generic “information processing/retention/[other activities] capability”… but, as noted, we already can’t predict capabilities, so such a measure would need to either solve that or else be less useful than e.g. parameter-count. --> If we classified and properly understood a taxonomy of capabilities, that’d help build such a measure! --> That task, itself, is probably really bloody difficult compared with “some hypothetical tactic that attacks the ‘IQ-style fire-alarm’ narrative head-on / in a different way”.
(A toy demo of “the IQ-style fire alarm won’t come” could be a good subtask of “toy demo of misalignment that actually convinces people”… OR it could end up as a “toy demo of capabilities that just pushes people to work more on capabilities”, which is obviously bad.)