Wouldn’t it just be “train M* to win debates against itself as judged by H”? Since in the original formulation of debate a human inspects the debate transcript without assistance.
Anyway, I agree that something like this is also a reasonable way to view debate. In this case, I was trying to emphasise the similarities between Debate and the other techniques: I claim that if we call the combination of the judge plus one debater Amp(M), then we can think of the debate as M* being trained to beat Amp(M) by Amp(M)’s own standards.
Maybe an easier way to visualise this is that, given some question, M* answers that question, and then Amp(M) tries to identify any flaws in the argument by interrogating M*, and rewards M* if no flaws can be found.
I claim that if we call the combination of the judge plus one debater Amp(M), then we can think of the debate as M* being trained to beat Amp(M) by Amp(M)’s own standards.
This seems like a reasonable way to think of debate.
I think, in practice (if this even means anything), the power of debate is quite bounded by the power of the human, so some other technique is needed to make the human capable of supervising complex debates, e.g. imitative amplification.
I think Debate is closer to “train M* to win debates against itself as judged by Amp(M)”.
Wouldn’t it just be “train M* to win debates against itself as judged by H”? Since in the original formulation of debate a human inspects the debate transcript without assistance.
Anyway, I agree that something like this is also a reasonable way to view debate. In this case, I was trying to emphasise the similarities between Debate and the other techniques: I claim that if we call the combination of the judge plus one debater Amp(M), then we can think of the debate as M* being trained to beat Amp(M) by Amp(M)’s own standards.
Maybe an easier way to visualise this is that, given some question, M* answers that question, and then Amp(M) tries to identify any flaws in the argument by interrogating M*, and rewards M* if no flaws can be found.
This seems like a reasonable way to think of debate.
I think, in practice (if this even means anything), the power of debate is quite bounded by the power of the human, so some other technique is needed to make the human capable of supervising complex debates, e.g. imitative amplification.