At tasks like “give a winning chess move”, we can generate high quality synthetic data so that it’s likely that we can finetune model performance to exceed top human intuitive play.
With some more effort, this also applies to “prove this mathematical conjecture” (using automated proof checkers like Lean) and (with suitably large and well-deigned automated test suites) also to “write code to solve this problem”. These seem like areas broad enough that scaling them up to far superhuman levels, as well as being inherently useful, might also carry over towards other tasks requiring rational and logical thinking. Also, this would probably be ab ideal forum in which to work on solutions to the ‘drunkenness’ issue.
With some more effort, this also applies to “prove this mathematical conjecture” (using automated proof checkers like Lean) and (with suitably large and well-deigned automated test suites) also to “write code to solve this problem”. These seem like areas broad enough that scaling them up to far superhuman levels, as well as being inherently useful, might also carry over towards other tasks requiring rational and logical thinking. Also, this would probably be ab ideal forum in which to work on solutions to the ‘drunkenness’ issue.