That is a concern, but only in the case where there’s no answer that has an argument tree that bottoms out in depth<D. As long as there exists an answer that is supported by a depth<D tree, this answer will beat the answers only supported by depth>D argument trees.
So there is a case where the debaters are not incentivised to be honest; the case where the debaters know something but there’s no human-understandable argument for it that bottoms out in <D steps. This is where we get the PSPACE constraint.
If we include discussion of cross-examination (which the analysis there did not include), then we can get rid of this constraint: each debater commits to an argument tree, then each debater points out the weakest node in the tree (or points out that some part of the tree doesn’t bottom out).
(we can only handle really large trees if we assume debaters are computationally unbounded in general though. If we don’t assume this, even if we still assume they have oracles for some specific problems, we still probably can’t supervise anything that’s not in NP, because of the obfuscated argument problem)
That is a concern, but only in the case where there’s no answer that has an argument tree that bottoms out in depth<D. As long as there exists an answer that is supported by a depth<D tree, this answer will beat the answers only supported by depth>D argument trees.
So there is a case where the debaters are not incentivised to be honest; the case where the debaters know something but there’s no human-understandable argument for it that bottoms out in <D steps. This is where we get the PSPACE constraint.
If we include discussion of cross-examination (which the analysis there did not include), then we can get rid of this constraint: each debater commits to an argument tree, then each debater points out the weakest node in the tree (or points out that some part of the tree doesn’t bottom out).
(we can only handle really large trees if we assume debaters are computationally unbounded in general though. If we don’t assume this, even if we still assume they have oracles for some specific problems, we still probably can’t supervise anything that’s not in NP, because of the obfuscated argument problem)