I tested it on 3 held-out problems and it got 1⁄3. Significant progress, increases the chance these can be solved with prompting. So partially it’s a question of if any major LLMs incorporate better auto prompting.
I tested it on 3 held-out problems and it got 1⁄3. Significant progress, increases the chance these can be solved with prompting. So partially it’s a question of if any major LLMs incorporate better auto prompting.