I specifically disagree with the IQ part and the codeforces part. Meaning, I think they’re misleading.
IQ and coding ability are useful measures of intelligence in humans because they correlate with a bunch of other things we care about. Not to say its useless to measure “IQ” or coding ability in LLMs, but presenting like they mean anything like what they mean in humans is wrong, or at least will give many people reading it the wrong impression.
As for the overall point of this post. I roughly agree? I mean, I think the timelines are not too unreasonable, and think the tri/quad lemma you put up can be a useful framing. I mostly disagree with using the metrics you put up first to quantify any of this. I think we should look at specific abilities current models have/lack, which are necessary for the scenarios you outlined, and how soon we’re likely to get them. But you do go through that somewhat in the post.
I think even with humans, IQ isn’t the best measure to quantify what we call intelligence. The way I tend to think of it is that high general intelligence correlates with higher IQ test scores, but just optimizing performance on IQ tests doesn’t necessarily mean that you become more intelligent in general outside of that task.
But I’m okay with the idea of using IQ scores in the context of this post because it seems useful to capture the change in capabilities of these models.
I specifically disagree with the IQ part and the codeforces part. Meaning, I think they’re misleading.
IQ and coding ability are useful measures of intelligence in humans because they correlate with a bunch of other things we care about. Not to say its useless to measure “IQ” or coding ability in LLMs, but presenting like they mean anything like what they mean in humans is wrong, or at least will give many people reading it the wrong impression.
As for the overall point of this post. I roughly agree? I mean, I think the timelines are not too unreasonable, and think the tri/quad lemma you put up can be a useful framing. I mostly disagree with using the metrics you put up first to quantify any of this. I think we should look at specific abilities current models have/lack, which are necessary for the scenarios you outlined, and how soon we’re likely to get them. But you do go through that somewhat in the post.
It doesn’t sound like we disagree at all.
I think even with humans, IQ isn’t the best measure to quantify what we call intelligence. The way I tend to think of it is that high general intelligence correlates with higher IQ test scores, but just optimizing performance on IQ tests doesn’t necessarily mean that you become more intelligent in general outside of that task.
But I’m okay with the idea of using IQ scores in the context of this post because it seems useful to capture the change in capabilities of these models.