I think you also need that at least some of the time good arguments are not describably bad
While I agree that there is a significant problem, I’m not confident I’d want to make that assumption.
As I mentioned in the other branch, I was thinking of differences in how easy lies are to find, rather than existence. It seems natural to me to assume that every individual thing does have a convincing counterargument, if we look through the space of all possible strings (not because I’m sure this is true, but because it’s the conservative assumption—I have no strong reason to think humans aren’t that hackable, even if we are less vulnerable to adversarial examples in some sense).
So my interpretation of “finding the honest equilibrium” in debate was, you enter a regime where the the (honest) debate strategies are too powerful, such that small mutations toward lying are defeated because they’re not lying well.
All of this was an implicit model, not a carefully thought out position on my part. Thus, I was saying things like “50% probability the opponent finds a plausible lie” which don’t make sense as an equilibrium analysis—in true equilibrium, players would know all the plausible lies, and know their opponents knew them, etc.
But, this kind of uncertainty still makes sense for any realistic level of training.
Furthermore, one might hope that the rational-player perspective (in which the risks and rewards of lying are balanced in order to determine whether to lie) simply doesn’t apply, because in order to suddenly start lying well, a player would have to invent the whole art of lying in one gradient descent step. So, if one is sufficiently stuck in an honesty “basin”, one cannot jump over the sides, even if there are perfectly good plays which involve doing so. I offer this as the steelman of the implicit position I had.
Overall, making this argument more explicit somewhat reduces my credulity in debate, because:
I was not explicitly recognizing that talk of “honest equilibrium” relies on assumptions about misleading counterarguments not existing, as opposed to weaker assumptions about them being hard to find (I think this also applies to regular debate, not just my framework here)
Steelmanning “dishonest arguments are harder to make” as an argument about training procedures, rather than about equilibrium, seems to rest on assumptions which would be difficult to gain confidence in.
-2/+1 Scoring
It’s worth explicitly noting that this weakens my argument for the −2/+1 scoring.
I was arguing that although −2/+1 can seriously disadvantage honest strategies in some cases (as you mention, it could mean the first player can lie, and the second player keeps silent to avoid retribution), it fixes a problem within the would-be honest attractor basin. Namely, I argued that it cut off otherwise problematic cases where dishonest players can force a tie (in expectation) by continuing to argue forever.
Now, the assumptions under which this is a problem are somewhat complex (as we’ve discussed). But I must assume there is a seeming counterargument to almost anything (at least, enough that the dishonest player can steer toward conversational territory in which this is true). Which means we can’t be making an argument about the equilibrium being good. Therefore, if this concern is relevant for us, we must be arguing about training rather than equilibrium behavior. (In the sense I discussed above.)
But if we’re arguing about training, we hopefully still have some assumption about lies being harder to find (during training). So, there should already be some other way to argue that you can’t go on dishonestly arguing forever.
So the situation would have to be pretty weird for −2/+1 to be useful.
(I don’t by any means intend to say that “a dishonest player continuing to argue in order to get a shot at not losing” isn’t a problem—just that if it’s a problem, it’s probably not a problem −2/+1 scoring can help with.)
Yeah all of this makes sense to me; I agree that you could make an argument about the difference in difficulty of finding defeaters to good vs. bad arguments, and that could then be used to say “debate will in practice lead to honest policies”.
While I agree that there is a significant problem, I’m not confident I’d want to make that assumption.
As I mentioned in the other branch, I was thinking of differences in how easy lies are to find, rather than existence. It seems natural to me to assume that every individual thing does have a convincing counterargument, if we look through the space of all possible strings (not because I’m sure this is true, but because it’s the conservative assumption—I have no strong reason to think humans aren’t that hackable, even if we are less vulnerable to adversarial examples in some sense).
So my interpretation of “finding the honest equilibrium” in debate was, you enter a regime where the the (honest) debate strategies are too powerful, such that small mutations toward lying are defeated because they’re not lying well.
All of this was an implicit model, not a carefully thought out position on my part. Thus, I was saying things like “50% probability the opponent finds a plausible lie” which don’t make sense as an equilibrium analysis—in true equilibrium, players would know all the plausible lies, and know their opponents knew them, etc.
But, this kind of uncertainty still makes sense for any realistic level of training.
Furthermore, one might hope that the rational-player perspective (in which the risks and rewards of lying are balanced in order to determine whether to lie) simply doesn’t apply, because in order to suddenly start lying well, a player would have to invent the whole art of lying in one gradient descent step. So, if one is sufficiently stuck in an honesty “basin”, one cannot jump over the sides, even if there are perfectly good plays which involve doing so. I offer this as the steelman of the implicit position I had.
Overall, making this argument more explicit somewhat reduces my credulity in debate, because:
I was not explicitly recognizing that talk of “honest equilibrium” relies on assumptions about misleading counterarguments not existing, as opposed to weaker assumptions about them being hard to find (I think this also applies to regular debate, not just my framework here)
Steelmanning “dishonest arguments are harder to make” as an argument about training procedures, rather than about equilibrium, seems to rest on assumptions which would be difficult to gain confidence in.
-2/+1 Scoring
It’s worth explicitly noting that this weakens my argument for the −2/+1 scoring.
I was arguing that although −2/+1 can seriously disadvantage honest strategies in some cases (as you mention, it could mean the first player can lie, and the second player keeps silent to avoid retribution), it fixes a problem within the would-be honest attractor basin. Namely, I argued that it cut off otherwise problematic cases where dishonest players can force a tie (in expectation) by continuing to argue forever.
Now, the assumptions under which this is a problem are somewhat complex (as we’ve discussed). But I must assume there is a seeming counterargument to almost anything (at least, enough that the dishonest player can steer toward conversational territory in which this is true). Which means we can’t be making an argument about the equilibrium being good. Therefore, if this concern is relevant for us, we must be arguing about training rather than equilibrium behavior. (In the sense I discussed above.)
But if we’re arguing about training, we hopefully still have some assumption about lies being harder to find (during training). So, there should already be some other way to argue that you can’t go on dishonestly arguing forever.
So the situation would have to be pretty weird for −2/+1 to be useful.
(I don’t by any means intend to say that “a dishonest player continuing to argue in order to get a shot at not losing” isn’t a problem—just that if it’s a problem, it’s probably not a problem −2/+1 scoring can help with.)
Yeah all of this makes sense to me; I agree that you could make an argument about the difference in difficulty of finding defeaters to good vs. bad arguments, and that could then be used to say “debate will in practice lead to honest policies”.