I have no reason to be especially optimistic given these results, but I suppose there may be some fairly simple questions for which it’s possible to enumerate a complete argument in a way that flaws will be clearly apparent.
In general, it seems like single-turn debate would have to rely on an extremely careful judge, which we don’t quite have, given the time constraint. Multi-turn seems likely to be more forgiving, especially if the judge has any influence over the course of the debate.
Do you have suggestions for domains where you do expect one-turn debate to work well, now that you’ve got these results?
I have no reason to be especially optimistic given these results, but I suppose there may be some fairly simple questions for which it’s possible to enumerate a complete argument in a way that flaws will be clearly apparent.
In general, it seems like single-turn debate would have to rely on an extremely careful judge, which we don’t quite have, given the time constraint. Multi-turn seems likely to be more forgiving, especially if the judge has any influence over the course of the debate.