Rohin Shah comments on Debate Minus Factored Cognition

Rohin Shah 26 Jan 2021 0:25 UTC
LW: 4 AF: 3
AF
Whoops, I seem to have missed this comment, sorry about that. I think at this point we’re nearly at agreement.
Ah, I suppose this is still consistent with honesty being an equilibrium. But it would then be a really weak sort of equilibrium—there would be no reason to be honest, but no specific reason to be dishonest, either.
Yeah, I agree this is possible. (The reason to not expect dishonesty is that sometimes you’ll see honest arguments to which there is no dishonest defeater.)
Then I concede that there is an honest equilibrium where the first player tells the truth, and the second player concedes (or, in simultaneous play, both players tell the truth and then concede). However, it does seem to be an extremely weak equilibrium—the second player is equally happy to lie, starting a back-and-forth chain which is a tie in expectation.
Similar comment here—the more you expect that honest claims will likely have dishonest defeaters, the weaker you expect the equilibrium to be. (E.g. it’s clearly not a tie when honest claims never have dishonest defeaters; in this case first player always wins.)
It seems plausible to me that there’s an incremental zero-sum scoring rule; EG, every convincing counterargument takes 1 point from the other player, so any dishonest statement is sure to lose you a point (in equilibrium). The hope would be that you always prefer to concede rather than argue, even if you’re already losing, in order to avoid losing more points.
However, this doesn’t work, because a dishonest (but convincing) argument gives you +1, and then −1 if it is refuted; so at worst it’s a wash. So again it’s a weak equilibrium, and if there’s any imperfection in the equilibrium at all, it actively incentivises lying when you would otherwise concede (because you want to take the chance that the opponent will not manage to refute your argument).
This was the line of reasoning which led me to the scoring rule in the post, since making it a −2 (but still only +1 for the other player) solves that issue.
On the specific −2/+1 proposal, the issue is that then the first player just makes some dishonest argument, and the second player concedes because even if they give an honest defeater, the second player could then re-defeat that with a dishonest defeater. (I realize I’m just repeating myself here; there’s more discussion in the next section.)
But more broadly, I claim that given your assumptions there is no possible scoring rule that (in the worst case) makes honesty a unique equilibrium. This worst case is when every argument has a defeater (and in particular, every honest argument has a dishonest defeater).
In this situation, there is no possible way to distinguish between honesty and dishonesty—under your assumptions, the thing that characterizes honesty is that honest arguments (at least sometimes) don’t have defeaters. From the perspective of the players, the salient feature of the game is that they can make statements; all such statements will have defeaters; there’s no information available to them in the structure of the game that distinguishes honesty from dishonesty. Therefore honesty can’t be the unique equilibrium; whatever the policy is, there should be an equivalent one that is at least sometimes dishonest.
In this worst case, I suspect that for any judge-based scoring rule, the equilibrium behavior is either “the first player says something and the second concedes”, or “every player always provides some arbitrary defeater of the previous statement, and the debate never ends / the debate goes to whoever got the last word”.
The probability of the opponent finding (and using) a dishonest defeater HAS TO be below 50%, in all cases, which is a pretty high bar. Although of course we can make an argument about how that probability should be below 50% if we’re already in an honest-enough regime. (IE we hope that the dishonest player prefers to concede at that point rather than refute the refutation, for the same reason as your argument gives—it’s too afraid of the triple refutation. This is precisely the argument we can’t make in the zero sum case.)
Sorry, I don’t get this. How could we make the argument that the probability is below 50%?
Depending on the answer, I expect I’d follow up with either
1. Why can’t the same argument apply in the zero sum case? or
2. Why can’t the same argument be used to say that the first player is happy to make a dishonest claim? or
3. Why is it okay for us to assume that we’re in an honest-enough regime?
Separately, I’d also want to understand how exactly we’re evading the argument I gave above about how the players can’t even distinguish between honesty and dishonesty in the worst case.
----
Things I explicitly agree with:
I assume (correct me if I’m wrong) that the scoring rules to “the zero sum setting” are something like: the judge assesses things at the end, giving +1 to the winner and −1 from the loser, or 0 in case of a tie.
and
Ahhh, this is actually a pretty interesting point, because it almost suggests that honesty is an Evolutionarily Stable Equilibrium, even though it’s only a Weak Nash Equilibrium. But I think that’s not quite true, since the strategy “lie when you would otherwise have to concede, but otherwise be honest” can invade the honest equilibrium. (IE that mutation would not be selected against, and could be actively selected for if we’re not quite in equilibrium, since players might not be quite perfect at finding the honest refutations for all lies.)
- abramdemski 27 Jan 2021 2:04 UTC
  LW: 4 AF: 4
  AF Parent
  Sorry, I don’t get this. How could we make the argument that the probability is below 50%?
  I think my analysis there was not particularly good, and only starts to make sense if we aren’t yet in equilibrium.
  Depending on the answer, I expect I’d follow up with either
  [...]
  3. Why is it okay for us to assume that we’re in an honest-enough regime?
  I think #3 is the most reasonable, with the answer being “I have no reason why that’s a reasonable assumption; I’m just saying, that’s what you’d usually try to argue in a debate context...”
  (As I stated in the OP, I have no claims as to how to induce honest equilibrium in my setup.)
- abramdemski 26 Jan 2021 22:59 UTC
  LW: 4 AF: 4
  AF Parent
  I agree that we are now largely in agreement about this branch of the discussion.
  
  Ah, I suppose this is still consistent with honesty being an equilibrium. But it would then be a really weak sort of equilibrium—there would be no reason to be honest, but no specific reason to be dishonest, either.
  
  Yeah, I agree this is possible. (The reason to not expect dishonesty is that sometimes you’ll see honest arguments to which there is no dishonest defeater.)
  
  Then I concede that there is an honest equilibrium where the first player tells the truth, and the second player concedes (or, in simultaneous play, both players tell the truth and then concede). However, it does seem to be an extremely weak equilibrium—the second player is equally happy to lie, starting a back-and-forth chain which is a tie in expectation.
  
  Similar comment here—the more you expect that honest claims will likely have dishonest defeaters, the weaker you expect the equilibrium to be. (E.g. it’s clearly not a tie when honest claims never have dishonest defeaters; in this case first player always wins.)
  
  My (admittedly conservative) supposition is that every claim does have a defeater which could be found by a sufficiently intelligent adversary, but, the difficulty of finding such claims can be much higher than finding honest ones.
  
  But more broadly, I claim that given your assumptions there is no possible scoring rule that (in the worst case) makes honesty a unique equilibrium. This worst case is when every argument has a defeater (and in particular, every honest argument has a dishonest defeater).
  
  In this situation, there is no possible way to distinguish between honesty and dishonesty—under your assumptions, the thing that characterizes honesty is that honest arguments (at least sometimes) don’t have defeaters.
  
  Yep, makes sense. So nothing distinguishes between an honest equilibrium and a dishonest one, for sufficiently smart players.
  
  There is still potentially room for guarantees/arguments about reaching honest equilibria (in the worst case) based on the training procedure, due to the idea that the honest defeaters are easier to find.
  What links here?
  - abramdemski's comment on Debate Minus Factored Cognition by abramdemski (29 Jan 2021 21:33 UTC; 2 points)