Joe Collman comments on Optimal play in human-judged Debate usually won’t answer your question

Joe Collman 28 Jan 2021 0:59 UTC
2 points
Oh no need for apologies: I’m certain the post was expressed imperfectly—I was understanding more as I wrote (I hope!). Often the most confusing parts are the most confused.
Since I’m mainly concerned with behaviour-during-training, I don’t think the post-training picture is too important to the point I’m making. However, it is interesting to consider what you’d expect to happen after training in the event that the debaters’ only convincing “ignore-the-question” arguments are training-signal based.
I think in that case I’d actually expect debaters to stop ignoring the question (assuming they know the training has stopped). I assume that a general, super-human question answerer must be able to do complex reasoning and generalise to new distributions. Removal of the training signal is a significant distributional shift, but one that I’d expect a general question-answerer to handle smoothly (in particular, we’re assuming it can answer questions about [optimal debating tactics once training has stopped]).

[ETA: I can imagine related issues with high-value-information bribery in a single debate:
”Give me a win in this branch of the tree, and I’ll give you high-value information in another branch”, or the like… though it’s a strange bargaining situation given that in most setups the debaters have identical information to offer. This could occur during or after training, but only in setups where the judge can give reward before the end of the debate.… Actually I’m not sure on that: if the judge always has the option to override earlier decisions with larger later rewards, then mid-debate rewards don’t commit the judge in any meaningful way, so aren’t really bargaining chips.
So I don’t think this style of bribery would work in setups I’ve seen.]