Here is the argument in the post more concisely. Hopefully this helps:
It is impossible to lie and say “I was able to find a proof” by the construction of the verifier (if you claim you were able to find a proof, the verifier needs to see the proof to believe you.) So the only way you can lie is by saying “I was not able to find a proof” when you could have if you had really tried. So incentivizing the AI to be honest is precisely the same as incentivizing them to avoid admitting “I was not able to find a proof.” Providing such an incentive is not trivial, but it is basically the easiest possible incentive to provide.
I know of no way to incentivize someone to answer the plain question easily just based on your ability to punish them or reward them when you choose to. Being able to punish them for lying involves being able to tell when they are lying.
Here is the argument in the post more concisely. Hopefully this helps:
It is impossible to lie and say “I was able to find a proof” by the construction of the verifier (if you claim you were able to find a proof, the verifier needs to see the proof to believe you.) So the only way you can lie is by saying “I was not able to find a proof” when you could have if you had really tried. So incentivizing the AI to be honest is precisely the same as incentivizing them to avoid admitting “I was not able to find a proof.” Providing such an incentive is not trivial, but it is basically the easiest possible incentive to provide.
I know of no way to incentivize someone to answer the plain question easily just based on your ability to punish them or reward them when you choose to. Being able to punish them for lying involves being able to tell when they are lying.
See this comment.