I think allowing the judge to abstain is a reasonable addition to the protocol—we mainly didn’t do this for simplicity, but it’s something we’re likely to incorporate in future work.
The main reason you might want to give the judge this option is that it makes it harder still for a dishonest debater to come out ahead, since (ideally) the judge will only rule in favor of the dishonest debater if the honest debater fails to rebut the dishonest debater’s arguments, the dishonest debater’s arguments are ruled sufficient by the judge, and the honest debater’s arguments are ruled insufficient by the judge. Of course, this also makes the honest debater’s job significantly harder, but I think we’re fine with that to some degree, insofar as we believe that the honest debater has a built-in advantage anyway (which is sort of a foundational assumption of Debate).
It’s also not clear that this is necessary though, since we’re primarily viewing Debate as a protocol for low-stakes alignment, where we care about average-case performance, in which case this kind of “graceful failure” seems less important.
I think allowing the judge to abstain is a reasonable addition to the protocol—we mainly didn’t do this for simplicity, but it’s something we’re likely to incorporate in future work.
The main reason you might want to give the judge this option is that it makes it harder still for a dishonest debater to come out ahead, since (ideally) the judge will only rule in favor of the dishonest debater if the honest debater fails to rebut the dishonest debater’s arguments, the dishonest debater’s arguments are ruled sufficient by the judge, and the honest debater’s arguments are ruled insufficient by the judge. Of course, this also makes the honest debater’s job significantly harder, but I think we’re fine with that to some degree, insofar as we believe that the honest debater has a built-in advantage anyway (which is sort of a foundational assumption of Debate).
It’s also not clear that this is necessary though, since we’re primarily viewing Debate as a protocol for low-stakes alignment, where we care about average-case performance, in which case this kind of “graceful failure” seems less important.