I agree timescale is a good way to think about this. My intuition is if high school math problems are 1 then IMO math problems are 100(1e2) and typical research math problems are 10,000(1e4). So exactly half way! I don’t have first hand experience with hardest research math problems, but from what I heard about timescale they seem to reach 1,000,000(1e6). I’d rate typical practical R&D problems 1e3 and transformative R&D problems 1e5.
Edit: Using this scale, I rate GPT-3 at 1 and GPT-4 at 10. This suggests GPT-5 for IMO, which feels uncomfortable to me! Thinking about this, I think while there are lots of 1-data and 10-data, there are considerably less 100-data and above that most things are not written down. But maybe that is an excuse and it doesn’t matter.
I agree timescale is a good way to think about this. My intuition is if high school math problems are 1 then IMO math problems are 100(1e2) and typical research math problems are 10,000(1e4). So exactly half way! I don’t have first hand experience with hardest research math problems, but from what I heard about timescale they seem to reach 1,000,000(1e6). I’d rate typical practical R&D problems 1e3 and transformative R&D problems 1e5.
Edit: Using this scale, I rate GPT-3 at 1 and GPT-4 at 10. This suggests GPT-5 for IMO, which feels uncomfortable to me! Thinking about this, I think while there are lots of 1-data and 10-data, there are considerably less 100-data and above that most things are not written down. But maybe that is an excuse and it doesn’t matter.