I only know that LLMs are continuing to steadily increase in some quality (which you are free to call “fake machine IQ” or whatever you want) and that If they continue to make progress at the current rate there will be consequences and we should prepare to deal with those consequences.
I think there’s a world where AIs continue to saturate benchmarks and the consequences are that the companies getting to say they saturate those benchmarks.
Especially at the tails of those benchmarks I imagine it won’t be about the consequences we care about like general reasoning, ability to act autonomously, etc.
on a metaphysical level I am completely on board with “there is no such thing as IQ. Different abilities are completely uncorrelated. Optimizing for metric X is uncorrelated with desired quality Y...”
On a practical level, however, I notice that every time OpenAI announces they have a newer shinier model, it both scores higher on whatever benchmark and is better at a bunch of practical things I care about.
Imagine there was a theoretically correct metric called the_thing_logan_actually_cares_about. I notice in my own experience there is a strong correlation between “fake machine IQ” and the_thing_logan_actually_cares_about. I further note that if one makes a linear fit against:
gpt4o is not literally equivalent to a 115 IQ human.
Use whatever word you want for the concept “score produced when an LLM takes an IQ test”.
But is this comparable to G? Is it what we want to measure?
I have no idea what you want to measure.
I only know that LLMs are continuing to steadily increase in some quality (which you are free to call “fake machine IQ” or whatever you want) and that If they continue to make progress at the current rate there will be consequences and we should prepare to deal with those consequences.
I think there’s a world where AIs continue to saturate benchmarks and the consequences are that the companies getting to say they saturate those benchmarks.
Especially at the tails of those benchmarks I imagine it won’t be about the consequences we care about like general reasoning, ability to act autonomously, etc.
on a metaphysical level I am completely on board with “there is no such thing as IQ. Different abilities are completely uncorrelated. Optimizing for metric X is uncorrelated with desired quality Y...”
On a practical level, however, I notice that every time OpenAI announces they have a newer shinier model, it both scores higher on whatever benchmark and is better at a bunch of practical things I care about.
Imagine there was a theoretically correct metric called the_thing_logan_actually_cares_about. I notice in my own experience there is a strong correlation between “fake machine IQ” and the_thing_logan_actually_cares_about. I further note that if one makes a linear fit against:
Progress_over_time + log(training flops) + log(inference flops)
It nicely predicts both the_thing_logan_actually_cares_about and “fake machine IQ”.
This reminds me of this LessWrong post.
If It’s Worth Doing, It’s Worth Doing With Made-Up Statistics
https://www.lesswrong.com/posts/9Tw5RqnEzqEtaoEkq/if-it-s-worth-doing-it-s-worth-doing-with-made-up-statistics