Going through previous ten IMOs, and imagining a very impressive automated theorem prover, I think
2020 - unlikely, need 5⁄6 and probably can’t get problems 3 or 6. Also good chance to mess up at 4 or 5
2019 - tough but possible, 3 seems hard but even that is not unimaginable, 5 might be hard but might be straightforward, and it can afford to get one wrong
2018 - tough but possible, 3 is easier for machine than human but probably still hard, 5 may be hard, can afford to miss one
2017 - tough but possible, 3 looks out of reach, 6 looks hard but not sure about that, 5 looks maybe hard, 1 is probably easy. But it can miss 2, which could happen.
2016 - probably not possible, 3 and 6 again look hard, and good chance to fail on 2 and 5, only allowed to miss 1
2015 - seems possible, 3 might be hard but like 50-50 it’s simple for machine, 6 is probably hard, but you can miss 2
2014 - probably not possible, can only miss 1, probably miss one of 2 or 5 and 6
2013 - probably not possible, 6 seems hard, 2 seems very hard, can only miss 1
2012 - tough but possible, 6 and 3 look hard but you can miss 2
2011 - seems possible, allowed to miss two and both 3 and 6 look brute-forceable
Overall this was much easier than I expected. 4⁄10 seem unlikely, 4⁄10 seem tough but possible, 2⁄10 I can imagine a machine doing it. There are a lot of problems that look really hard, but there are a fair number of tests where you can just skip those.
That said, even to get the possible ones you do need to be surprisingly impressive, and that’s getting cut down by like 25-50% for a solvable test. That said, they get to keep trying (assuming they get promising results in early years) and eventually they will hit one of the easier years.
It also looks fairly likely to me that if one of DeepMind or OpenAI tries seriously they will be able to get an HM with a quite reasonable chance at bronze, and this is maybe enough of a PR coup to motivate work, and then it’s more likely there will be a large effort subsequently to finish the job or to opportunistically take advantage of an easy test.
Overall I’m feeling bad about my 4%, I deserve to lose some points regardless but might think about what my real probability is after looking at tests (though I was also probably moved by other folks in EA systematically giving higher estimates than I did).
Going through previous ten IMOs, and imagining a very impressive automated theorem prover, I think
2020 - unlikely, need 5⁄6 and probably can’t get problems 3 or 6. Also good chance to mess up at 4 or 5
2019 - tough but possible, 3 seems hard but even that is not unimaginable, 5 might be hard but might be straightforward, and it can afford to get one wrong
2018 - tough but possible, 3 is easier for machine than human but probably still hard, 5 may be hard, can afford to miss one
2017 - tough but possible, 3 looks out of reach, 6 looks hard but not sure about that, 5 looks maybe hard, 1 is probably easy. But it can miss 2, which could happen.
2016 - probably not possible, 3 and 6 again look hard, and good chance to fail on 2 and 5, only allowed to miss 1
2015 - seems possible, 3 might be hard but like 50-50 it’s simple for machine, 6 is probably hard, but you can miss 2
2014 - probably not possible, can only miss 1, probably miss one of 2 or 5 and 6
2013 - probably not possible, 6 seems hard, 2 seems very hard, can only miss 1
2012 - tough but possible, 6 and 3 look hard but you can miss 2
2011 - seems possible, allowed to miss two and both 3 and 6 look brute-forceable
Overall this was much easier than I expected. 4⁄10 seem unlikely, 4⁄10 seem tough but possible, 2⁄10 I can imagine a machine doing it. There are a lot of problems that look really hard, but there are a fair number of tests where you can just skip those.
That said, even to get the possible ones you do need to be surprisingly impressive, and that’s getting cut down by like 25-50% for a solvable test. That said, they get to keep trying (assuming they get promising results in early years) and eventually they will hit one of the easier years.
It also looks fairly likely to me that if one of DeepMind or OpenAI tries seriously they will be able to get an HM with a quite reasonable chance at bronze, and this is maybe enough of a PR coup to motivate work, and then it’s more likely there will be a large effort subsequently to finish the job or to opportunistically take advantage of an easy test.
Overall I’m feeling bad about my 4%, I deserve to lose some points regardless but might think about what my real probability is after looking at tests (though I was also probably moved by other folks in EA systematically giving higher estimates than I did).
What do you think of Deepmind’s new whoop-de-doo about doing research-level math assisted by GNNs?
Not surprising in any of the ways that good IMO performance would be surprising.