I do not think the ratio of the “AI solves hardest problem” and “AI has Gold” probabilities is right here. Paul was at the IMO in 2008, but he might have forgotten some details...
(My qualifications here: high IMO Silver in 2016, but more importantly I was a Jury member on the Romanian Master of Mathematics recently. The RMM is considered the harder version of the IMO, and shares a good part of the Problem Selection Committee with it.)
The IMO Jury does not consider “bashability” of problems as a decision factor, in the regime where the bashing would take good contestants more than a few hours. But for a dedicated bashing program, it makes no difference.
It is extremely likely that an “AI” solving most IMO geometry problems is possible today—the main difficulty being converting the text into an algebraic statement. Given that, polynomial system solvers should easily tackle such problems.
Say the order of the problems is (Day 1: CNG, Day 2: GAC). The geometry solver gives you 14 points. For a chance on IMO Gold, you have to solve the easiest combinatorics problem, plus one of either algebra or number theory.
Given the recent progress on coding problems as in AlphaCode, I place over 50% probability on IMO #1/#4 combinatorics problems being solvable by 2024. If that turns out to be true, then the “AI has Gold” event becomes “AI solves a medium N or a medium A problem, or both if contestants find them easy”.
Now, as noted elsewhere in the thread, there are various types of N and A problems that we might consider “easy” for an AI. Several IMOs in the last ten years contain those.
In 2015, the easiest five problems consisted out of: two bashable G problems (#3, #4), an easy C (#1), a diophantine equation N (#2) and a functional equation A (#5). Given such a problemset, a dedicated AI might be able to score 35 points, without having capabilities remotely enough to tackle the combinatorics #6.
The only way the Gold probability could be comparable to “hardest problem” probability is if the bet only takes general problem-solving models into account. Otherwise, inductive bias one could build into such a model (e.g. resort to a dedicated diophantine equation solver) helps much more in one than in the other.
I think this is quite plausible. Also see toner’s comment in the other direction though. Both probabilities are somewhat high because there are lots of easy IMO problems. Like you, I think “hardest problem” is quite a bit harder than a gold, though it seems you think the gap is larger (and most likely it sounds like you put a much higher probability on an IMO gold overall).
Overall I think that AI can solve most geometry problems and 3-variable inequalities for free, and many functional equations and diophantine equations seem easy. And I think the easiest problems are also likely to be soluble. In some years this might let it get a gold (e.g. 2015 is a good year), but typically I think it’s still going to be left with problems that are out of reach.
I would put a lower probability for “hardest problem” if we were actually able to hand-pick a really hard problem; the main risk is that sometimes the hardest problem defined in this way will also be bashable for one reason or another.
I do not think the ratio of the “AI solves hardest problem” and “AI has Gold” probabilities is right here. Paul was at the IMO in 2008, but he might have forgotten some details...
(My qualifications here: high IMO Silver in 2016, but more importantly I was a Jury member on the Romanian Master of Mathematics recently. The RMM is considered the harder version of the IMO, and shares a good part of the Problem Selection Committee with it.)
The IMO Jury does not consider “bashability” of problems as a decision factor, in the regime where the bashing would take good contestants more than a few hours. But for a dedicated bashing program, it makes no difference.
It is extremely likely that an “AI” solving most IMO geometry problems is possible today—the main difficulty being converting the text into an algebraic statement. Given that, polynomial system solvers should easily tackle such problems.
Say the order of the problems is (Day 1: CNG, Day 2: GAC). The geometry solver gives you 14 points. For a chance on IMO Gold, you have to solve the easiest combinatorics problem, plus one of either algebra or number theory.
Given the recent progress on coding problems as in AlphaCode, I place over 50% probability on IMO #1/#4 combinatorics problems being solvable by 2024. If that turns out to be true, then the “AI has Gold” event becomes “AI solves a medium N or a medium A problem, or both if contestants find them easy”.
Now, as noted elsewhere in the thread, there are various types of N and A problems that we might consider “easy” for an AI. Several IMOs in the last ten years contain those.
In 2015, the easiest five problems consisted out of: two bashable G problems (#3, #4), an easy C (#1), a diophantine equation N (#2) and a functional equation A (#5). Given such a problemset, a dedicated AI might be able to score 35 points, without having capabilities remotely enough to tackle the combinatorics #6.
The only way the Gold probability could be comparable to “hardest problem” probability is if the bet only takes general problem-solving models into account. Otherwise, inductive bias one could build into such a model (e.g. resort to a dedicated diophantine equation solver) helps much more in one than in the other.
I think this is quite plausible. Also see toner’s comment in the other direction though. Both probabilities are somewhat high because there are lots of easy IMO problems. Like you, I think “hardest problem” is quite a bit harder than a gold, though it seems you think the gap is larger (and most likely it sounds like you put a much higher probability on an IMO gold overall).
Overall I think that AI can solve most geometry problems and 3-variable inequalities for free, and many functional equations and diophantine equations seem easy. And I think the easiest problems are also likely to be soluble. In some years this might let it get a gold (e.g. 2015 is a good year), but typically I think it’s still going to be left with problems that are out of reach.
I would put a lower probability for “hardest problem” if we were actually able to hand-pick a really hard problem; the main risk is that sometimes the hardest problem defined in this way will also be bashable for one reason or another.