If the correct-side debator uses invalid claims as part of its arguments, and the judge fails to catch this… It would make me feel that something was amiss. That perhaps this wasn’t a good proxy for a high-stakes debator between competent debtors trying to convince a smart and motivated human judge about facts about the world.
And if, given the full set of cited sources from both sides of the debate, the judge is able to consistently come to the correct answer, then the question isn’t hard enough.
I agree that this would be a more interesting setup. But why do you see it as necessary to validate the ‘weak supervising strong’ hypothesis?
If the correct-side debator uses invalid claims as part of its arguments, and the judge fails to catch this… It would make me feel that something was amiss. That perhaps this wasn’t a good proxy for a high-stakes debator between competent debtors trying to convince a smart and motivated human judge about facts about the world.
And if, given the full set of cited sources from both sides of the debate, the judge is able to consistently come to the correct answer, then the question isn’t hard enough.