Isn’t this way of solving just Goodharting the metric and actually pushing the LLM away from being “General Intelligence”?
I certainly agree that solving ARC-AGI in this way wouldn’t indicate general intelligence. (And, I generally think that ARC-AGI probably doesn’t track general intelligence that well. And it tracks even less when no holds barred methods are considered.)
But doing so would tune that GPT-4o to be less good at other tasks, wouldn’t it?
I don’t think it would degrade performance on other tasks very much if you took basic precautions. LLMs have a lot of parameters.
I certainly agree that solving ARC-AGI in this way wouldn’t indicate general intelligence. (And, I generally think that ARC-AGI probably doesn’t track general intelligence that well. And it tracks even less when no holds barred methods are considered.)
I don’t think it would degrade performance on other tasks very much if you took basic precautions. LLMs have a lot of parameters.