Is your disagreement specifically with the word “IQ” or with the broader point, that AI progress is continuing to make progress at a steady rate that implies things are going to happen soon-ish (2-4 years)?
If specifically with IQ, feel free to replace the word with “abstract units of machine intelligence” wherever appropriate.
If with “big things soon”, care to make a prediction?
I specifically disagree with the IQ part and the codeforces part. Meaning, I think they’re misleading.
IQ and coding ability are useful measures of intelligence in humans because they correlate with a bunch of other things we care about. Not to say its useless to measure “IQ” or coding ability in LLMs, but presenting like they mean anything like what they mean in humans is wrong, or at least will give many people reading it the wrong impression.
As for the overall point of this post. I roughly agree? I mean, I think the timelines are not too unreasonable, and think the tri/quad lemma you put up can be a useful framing. I mostly disagree with using the metrics you put up first to quantify any of this. I think we should look at specific abilities current models have/lack, which are necessary for the scenarios you outlined, and how soon we’re likely to get them. But you do go through that somewhat in the post.
I think even with humans, IQ isn’t the best measure to quantify what we call intelligence. The way I tend to think of it is that high general intelligence correlates with higher IQ test scores, but just optimizing performance on IQ tests doesn’t necessarily mean that you become more intelligent in general outside of that task.
But I’m okay with the idea of using IQ scores in the context of this post because it seems useful to capture the change in capabilities of these models.
If specifically with IQ, feel free to replace the word with “abstract units of machine intelligence” wherever appropriate.
By calling it “IQ”, you were (EDIT: the creator of that table was) saying that gpt4o is comparable to a 115 IQ human, etc. If you don’t intend that claim, if that replacement would preserve your meaning, you shouldn’t have called it IQ. (IMO that claim doesn’t make sense — LLMs don’t have human-like ability profiles.)
I only know that LLMs are continuing to steadily increase in some quality (which you are free to call “fake machine IQ” or whatever you want) and that If they continue to make progress at the current rate there will be consequences and we should prepare to deal with those consequences.
I think there’s a world where AIs continue to saturate benchmarks and the consequences are that the companies getting to say they saturate those benchmarks.
Especially at the tails of those benchmarks I imagine it won’t be about the consequences we care about like general reasoning, ability to act autonomously, etc.
on a metaphysical level I am completely on board with “there is no such thing as IQ. Different abilities are completely uncorrelated. Optimizing for metric X is uncorrelated with desired quality Y...”
On a practical level, however, I notice that every time OpenAI announces they have a newer shinier model, it both scores higher on whatever benchmark and is better at a bunch of practical things I care about.
Imagine there was a theoretically correct metric called the_thing_logan_actually_cares_about. I notice in my own experience there is a strong correlation between “fake machine IQ” and the_thing_logan_actually_cares_about. I further note that if one makes a linear fit against:
Comparing IQ and codeforces doesn’t make much sense. Please stop doing this.
Attaching IQs to LLMs makes even less sense. Except as a very loose metaphor. But please also stop doing this.
Is your disagreement specifically with the word “IQ” or with the broader point, that AI progress is continuing to make progress at a steady rate that implies things are going to happen soon-ish (2-4 years)?
If specifically with IQ, feel free to replace the word with “abstract units of machine intelligence” wherever appropriate.
If with “big things soon”, care to make a prediction?
I specifically disagree with the IQ part and the codeforces part. Meaning, I think they’re misleading.
IQ and coding ability are useful measures of intelligence in humans because they correlate with a bunch of other things we care about. Not to say its useless to measure “IQ” or coding ability in LLMs, but presenting like they mean anything like what they mean in humans is wrong, or at least will give many people reading it the wrong impression.
As for the overall point of this post. I roughly agree? I mean, I think the timelines are not too unreasonable, and think the tri/quad lemma you put up can be a useful framing. I mostly disagree with using the metrics you put up first to quantify any of this. I think we should look at specific abilities current models have/lack, which are necessary for the scenarios you outlined, and how soon we’re likely to get them. But you do go through that somewhat in the post.
It doesn’t sound like we disagree at all.
I think even with humans, IQ isn’t the best measure to quantify what we call intelligence. The way I tend to think of it is that high general intelligence correlates with higher IQ test scores, but just optimizing performance on IQ tests doesn’t necessarily mean that you become more intelligent in general outside of that task.
But I’m okay with the idea of using IQ scores in the context of this post because it seems useful to capture the change in capabilities of these models.
By calling it “IQ”,
you were(EDIT: the creator of that table was) saying that gpt4o is comparable to a 115 IQ human, etc.If you don’t intend that claim, if that replacement would preserve your meaning, you shouldn’t have called it IQ.(IMO that claim doesn’t make sense — LLMs don’t have human-like ability profiles.)gpt4o is not literally equivalent to a 115 IQ human.
Use whatever word you want for the concept “score produced when an LLM takes an IQ test”.
But is this comparable to G? Is it what we want to measure?
I have no idea what you want to measure.
I only know that LLMs are continuing to steadily increase in some quality (which you are free to call “fake machine IQ” or whatever you want) and that If they continue to make progress at the current rate there will be consequences and we should prepare to deal with those consequences.
I think there’s a world where AIs continue to saturate benchmarks and the consequences are that the companies getting to say they saturate those benchmarks.
Especially at the tails of those benchmarks I imagine it won’t be about the consequences we care about like general reasoning, ability to act autonomously, etc.
on a metaphysical level I am completely on board with “there is no such thing as IQ. Different abilities are completely uncorrelated. Optimizing for metric X is uncorrelated with desired quality Y...”
On a practical level, however, I notice that every time OpenAI announces they have a newer shinier model, it both scores higher on whatever benchmark and is better at a bunch of practical things I care about.
Imagine there was a theoretically correct metric called the_thing_logan_actually_cares_about. I notice in my own experience there is a strong correlation between “fake machine IQ” and the_thing_logan_actually_cares_about. I further note that if one makes a linear fit against:
Progress_over_time + log(training flops) + log(inference flops)
It nicely predicts both the_thing_logan_actually_cares_about and “fake machine IQ”.
This reminds me of this LessWrong post.
If It’s Worth Doing, It’s Worth Doing With Made-Up Statistics
https://www.lesswrong.com/posts/9Tw5RqnEzqEtaoEkq/if-it-s-worth-doing-it-s-worth-doing-with-made-up-statistics