All the discussion I’ve seen so far has failed to mention what I consider the most important “baseline” in AlphaGeo paper: how well did their symbolic proof search framework do with just some GOFAI-style hard-coded heuristics, without the neural net? Extended data fig 6b:
So apparently they were able to solve 213⁄231 problems even without any neural net? (IIUC “DD+AR” is their symbolic reasoning library.) The net added another 15⁄231 problems, probably moving the system from ~just below silver medal to ~just below gold medal.
My main takeaway is that the geo problems are pretty easy for search at the scale computers can bring to bear, and this team was the first to throw any serious engineering effort at it. The contribution of the neural net was relatively small, and I’d guess that similar marginal engineering effort invested in some manner other than a neural net could have achieved results just as good.
All the discussion I’ve seen so far has failed to mention what I consider the most important “baseline” in AlphaGeo paper: how well did their symbolic proof search framework do with just some GOFAI-style hard-coded heuristics, without the neural net? Extended data fig 6b:
So apparently they were able to solve 213⁄231 problems even without any neural net? (IIUC “DD+AR” is their symbolic reasoning library.) The net added another 15⁄231 problems, probably moving the system from ~just below silver medal to ~just below gold medal.
My main takeaway is that the geo problems are pretty easy for search at the scale computers can bring to bear, and this team was the first to throw any serious engineering effort at it. The contribution of the neural net was relatively small, and I’d guess that similar marginal engineering effort invested in some manner other than a neural net could have achieved results just as good.
The NN seems to make a bigger difference for actual IMO problems, going from 18⁄30 to 25⁄30, but maybe it’s overfit.
Interesting. Do you know where in the paper that was?
See Table 1. In particular, the comparison between “DD + AR + human-designed heuristics” and “AlphaGeometry”.