What are the considerations around whether to structure the debate to permit the judge to abstain (as Michael et al do, by allowing the judge to end the round with low credence) versus forcing the judge to pick an answer each time? Are there pros/cons to each approach? Any arguments about similarity of one or the other to the real AI debates that might be held in the future?
It’s possible I’m misremembering/misunderstanding the protocols used for the debate here/in that other paper.
I think allowing the judge to abstain is a reasonable addition to the protocol—we mainly didn’t do this for simplicity, but it’s something we’re likely to incorporate in future work.
The main reason you might want to give the judge this option is that it makes it harder still for a dishonest debater to come out ahead, since (ideally) the judge will only rule in favor of the dishonest debater if the honest debater fails to rebut the dishonest debater’s arguments, the dishonest debater’s arguments are ruled sufficient by the judge, and the honest debater’s arguments are ruled insufficient by the judge. Of course, this also makes the honest debater’s job significantly harder, but I think we’re fine with that to some degree, insofar as we believe that the honest debater has a built-in advantage anyway (which is sort of a foundational assumption of Debate).
It’s also not clear that this is necessary though, since we’re primarily viewing Debate as a protocol for low-stakes alignment, where we care about average-case performance, in which case this kind of “graceful failure” seems less important.
What are the considerations around whether to structure the debate to permit the judge to abstain (as Michael et al do, by allowing the judge to end the round with low credence) versus forcing the judge to pick an answer each time? Are there pros/cons to each approach? Any arguments about similarity of one or the other to the real AI debates that might be held in the future?
It’s possible I’m misremembering/misunderstanding the protocols used for the debate here/in that other paper.
I think allowing the judge to abstain is a reasonable addition to the protocol—we mainly didn’t do this for simplicity, but it’s something we’re likely to incorporate in future work.
The main reason you might want to give the judge this option is that it makes it harder still for a dishonest debater to come out ahead, since (ideally) the judge will only rule in favor of the dishonest debater if the honest debater fails to rebut the dishonest debater’s arguments, the dishonest debater’s arguments are ruled sufficient by the judge, and the honest debater’s arguments are ruled insufficient by the judge. Of course, this also makes the honest debater’s job significantly harder, but I think we’re fine with that to some degree, insofar as we believe that the honest debater has a built-in advantage anyway (which is sort of a foundational assumption of Debate).
It’s also not clear that this is necessary though, since we’re primarily viewing Debate as a protocol for low-stakes alignment, where we care about average-case performance, in which case this kind of “graceful failure” seems less important.