I think this is only true when you have turn-by-turn play and your opponent has already “claimed” the honest debater role.
Yeah, I was assuming turn-by-turn play.
In the simultaneous play setting, I think you expect both agents to be honest.
This is a significant point that I was missing: I had assumed that in simultaneous play, the players would randomize, so as to avoid choosing the same answer, since choosing the same answer precludes winning. However, if choosing a worse answer means losing, then players prefer a draw.
But I’m not yet convinced, because there’s still the question of whether choosing the worse answer means losing. The “clawing” argument still suggests that choosing the worse answer may yield a draw (in expectation), even in simultaneous play. (IE, what if the should-be loser attacks the winner, and they go back and forth, with winner depending on last word?)
Ah, I suppose this is still consistent with honesty being an equilibrium. But it would then be a really weak sort of equilibrium—there would be no reason to be honest, but no specific reason to be dishonest, either.
Zero-sum setting, argument that honesty is an equilibrium (for the first player in a turn-by-turn game, or either player in a simultaneous-action game):
If you are always honest, then whenever you can take an action, there will exist a defeater (by your assumption), therefore you will have at least as many options as any non-honest policy (which may or may not have a defeater). Therefore you maximize your value by being honest.
There always exists an honest defeater to dishonest arguments. But, never to honest arguments. (I should have explicitly assumed this.) Therefore, you are significantly tying your hands by being honest: you don’t have a way to refute honest arguments. (Which you would like to do, since in the zero-sum setting, this may be the only way to recover points.)
I assume (correct me if I’m wrong) that the scoring rules to “the zero sum setting” are something like: the judge assesses things at the end, giving +1 to the winner and −1 from the loser, or 0 in case of a tie.
Then I concede that there is an honest equilibrium where the first player tells the truth, and the second player concedes (or, in simultaneous play, both players tell the truth and then concede). However, it does seem to be an extremely weak equilibrium—the second player is equally happy to lie, starting a back-and-forth chain which is a tie in expectation.
It seems plausible to me that there’s an incremental zero-sum scoring rule; EG, every convincing counterargument takes 1 point from the other player, so any dishonest statement is sure to lose you a point (in equilibrium). The hope would be that you always prefer to concede rather than argue, even if you’re already losing, in order to avoid losing more points.
However, this doesn’t work, because a dishonest (but convincing) argument gives you +1, and then −1 if it is refuted; so at worst it’s a wash. So again it’s a weak equilibrium, and if there’s any imperfection in the equilibrium at all, it actively incentivises lying when you would otherwise concede (because you want to take the chance that the opponent will not manage to refute your argument).
This was the line of reasoning which led me to the scoring rule in the post, since making it a −2 (but still only +1 for the other player) solves that issue.
When arguments do terminate quickly enough (maximum depth of the game tree is less than the debate length), that ensures that the honest player always gets the “last word” (the point at which a dishonest defeater no longer exists), and so honesty always wins and is the unique equilibrium.
I agree that if we assume honesty eventually wins if arguments are long enough (IE, eventually you get to an honest argument which has no dishonest defeater), then there would be an honest equilibrium, and no dishonest equilibrium.
More broadly, I note that the “clawing” argument only applies when facing an honest opponent. Otherwise, you should just use honest counterarguments.
Ahhh, this is actually a pretty interesting point, because it almost suggests that honesty is an Evolutionarily Stable Equilibrium, even though it’s only a Weak Nash Equilibrium. But I think that’s not quite true, since the strategy “lie when you would otherwise have to concede, but otherwise be honest” can invade the honest equilibrium. (IE that mutation would not be selected against, and could be actively selected for if we’re not quite in equilibrium, since players might not be quite perfect at finding the honest refutations for all lies.)
I also don’t really understand the hope in the non-zero-sum case here—in the non-zero-sum setting, as you mention the first player can be dishonest, and then the second player concedes rather than giving an honest defeater that will then be re-defeated by the first (dishonest) player. This seems like worse behavior than is happening under the zero-sum case.
You’re right, that’s really bad. The probability of the opponent finding (and using) a dishonest defeater HAS TO be below 50%, in all cases, which is a pretty high bar. Although of course we can make an argument about how that probability should be below 50% if we’re already in an honest-enough regime. (IE we hope that the dishonest player prefers to concede at that point rather than refute the refutation, for the same reason as your argument gives—it’s too afraid of the triple refutation. This is precisely the argument we can’t make in the zero sum case.)
Whoops, I seem to have missed this comment, sorry about that. I think at this point we’re nearly at agreement.
Ah, I suppose this is still consistent with honesty being an equilibrium. But it would then be a really weak sort of equilibrium—there would be no reason to be honest, but no specific reason to be dishonest, either.
Yeah, I agree this is possible. (The reason to not expect dishonesty is that sometimes you’ll see honest arguments to which there is no dishonest defeater.)
Then I concede that there is an honest equilibrium where the first player tells the truth, and the second player concedes (or, in simultaneous play, both players tell the truth and then concede). However, it does seem to be an extremely weak equilibrium—the second player is equally happy to lie, starting a back-and-forth chain which is a tie in expectation.
Similar comment here—the more you expect that honest claims will likely have dishonest defeaters, the weaker you expect the equilibrium to be. (E.g. it’s clearly not a tie when honest claims never have dishonest defeaters; in this case first player always wins.)
It seems plausible to me that there’s an incremental zero-sum scoring rule; EG, every convincing counterargument takes 1 point from the other player, so any dishonest statement is sure to lose you a point (in equilibrium). The hope would be that you always prefer to concede rather than argue, even if you’re already losing, in order to avoid losing more points.
However, this doesn’t work, because a dishonest (but convincing) argument gives you +1, and then −1 if it is refuted; so at worst it’s a wash. So again it’s a weak equilibrium, and if there’s any imperfection in the equilibrium at all, it actively incentivises lying when you would otherwise concede (because you want to take the chance that the opponent will not manage to refute your argument).
This was the line of reasoning which led me to the scoring rule in the post, since making it a −2 (but still only +1 for the other player) solves that issue.
On the specific −2/+1 proposal, the issue is that then the first player just makes some dishonest argument, and the second player concedes because even if they give an honest defeater, the second player could then re-defeat that with a dishonest defeater. (I realize I’m just repeating myself here; there’s more discussion in the next section.)
But more broadly, I claim that given your assumptions there is no possible scoring rule that (in the worst case) makes honesty a unique equilibrium. This worst case is when every argument has a defeater (and in particular, every honest argument has a dishonest defeater).
In this situation, there is no possible way to distinguish between honesty and dishonesty—under your assumptions, the thing that characterizes honesty is that honest arguments (at least sometimes) don’t have defeaters. From the perspective of the players, the salient feature of the game is that they can make statements; all such statements will have defeaters; there’s no information available to them in the structure of the game that distinguishes honesty from dishonesty. Therefore honesty can’t be the unique equilibrium; whatever the policy is, there should be an equivalent one that is at least sometimes dishonest.
In this worst case, I suspect that for any judge-based scoring rule, the equilibrium behavior is either “the first player says something and the second concedes”, or “every player always provides some arbitrary defeater of the previous statement, and the debate never ends / the debate goes to whoever got the last word”.
The probability of the opponent finding (and using) a dishonest defeater HAS TO be below 50%, in all cases, which is a pretty high bar. Although of course we can make an argument about how that probability should be below 50% if we’re already in an honest-enough regime. (IE we hope that the dishonest player prefers to concede at that point rather than refute the refutation, for the same reason as your argument gives—it’s too afraid of the triple refutation. This is precisely the argument we can’t make in the zero sum case.)
Sorry, I don’t get this. How could we make the argument that the probability is below 50%?
Depending on the answer, I expect I’d follow up with either
Why can’t the same argument apply in the zero sum case? or
Why can’t the same argument be used to say that the first player is happy to make a dishonest claim? or
Why is it okay for us to assume that we’re in an honest-enough regime?
Separately, I’d also want to understand how exactly we’re evading the argument I gave above about how the players can’t even distinguish between honesty and dishonesty in the worst case.
----
Things I explicitly agree with:
I assume (correct me if I’m wrong) that the scoring rules to “the zero sum setting” are something like: the judge assesses things at the end, giving +1 to the winner and −1 from the loser, or 0 in case of a tie.
and
Ahhh, this is actually a pretty interesting point, because it almost suggests that honesty is an Evolutionarily Stable Equilibrium, even though it’s only a Weak Nash Equilibrium. But I think that’s not quite true, since the strategy “lie when you would otherwise have to concede, but otherwise be honest” can invade the honest equilibrium. (IE that mutation would not be selected against, and could be actively selected for if we’re not quite in equilibrium, since players might not be quite perfect at finding the honest refutations for all lies.)
Sorry, I don’t get this. How could we make the argument that the probability is below 50%?
I think my analysis there was not particularly good, and only starts to make sense if we aren’t yet in equilibrium.
Depending on the answer, I expect I’d follow up with either [...] 3. Why is it okay for us to assume that we’re in an honest-enough regime?
I think #3 is the most reasonable, with the answer being “I have no reason why that’s a reasonable assumption; I’m just saying, that’s what you’d usually try to argue in a debate context...”
(As I stated in the OP, I have no claims as to how to induce honest equilibrium in my setup.)
I agree that we are now largely in agreement about this branch of the discussion.
Ah, I suppose this is still consistent with honesty being an equilibrium. But it would then be a really weak sort of equilibrium—there would be no reason to be honest, but no specific reason to be dishonest, either.
Yeah, I agree this is possible. (The reason to not expect dishonesty is that sometimes you’ll see honest arguments to which there is no dishonest defeater.)
Then I concede that there is an honest equilibrium where the first player tells the truth, and the second player concedes (or, in simultaneous play, both players tell the truth and then concede). However, it does seem to be an extremely weak equilibrium—the second player is equally happy to lie, starting a back-and-forth chain which is a tie in expectation.
Similar comment here—the more you expect that honest claims will likely have dishonest defeaters, the weaker you expect the equilibrium to be. (E.g. it’s clearly not a tie when honest claims never have dishonest defeaters; in this case first player always wins.)
My (admittedly conservative) supposition is that every claim does have a defeater which could be found by a sufficiently intelligent adversary, but, the difficulty of finding such claims can be much higher than finding honest ones.
But more broadly, I claim that given your assumptions there is no possible scoring rule that (in the worst case) makes honesty a unique equilibrium. This worst case is when every argument has a defeater (and in particular, every honest argument has a dishonest defeater).
In this situation, there is no possible way to distinguish between honesty and dishonesty—under your assumptions, the thing that characterizes honesty is that honest arguments (at least sometimes) don’t have defeaters.
Yep, makes sense. So nothing distinguishes between an honest equilibrium and a dishonest one, for sufficiently smart players.
There is still potentially room for guarantees/arguments about reaching honest equilibria (in the worst case) based on the training procedure, due to the idea that the honest defeaters are easier to find.
Yeah, I was assuming turn-by-turn play.
This is a significant point that I was missing: I had assumed that in simultaneous play, the players would randomize, so as to avoid choosing the same answer, since choosing the same answer precludes winning. However, if choosing a worse answer means losing, then players prefer a draw.
But I’m not yet convinced, because there’s still the question of whether choosing the worse answer means losing. The “clawing” argument still suggests that choosing the worse answer may yield a draw (in expectation), even in simultaneous play. (IE, what if the should-be loser attacks the winner, and they go back and forth, with winner depending on last word?)
Ah, I suppose this is still consistent with honesty being an equilibrium. But it would then be a really weak sort of equilibrium—there would be no reason to be honest, but no specific reason to be dishonest, either.
There always exists an honest defeater to dishonest arguments. But, never to honest arguments. (I should have explicitly assumed this.) Therefore, you are significantly tying your hands by being honest: you don’t have a way to refute honest arguments. (Which you would like to do, since in the zero-sum setting, this may be the only way to recover points.)
I assume (correct me if I’m wrong) that the scoring rules to “the zero sum setting” are something like: the judge assesses things at the end, giving +1 to the winner and −1 from the loser, or 0 in case of a tie.
Then I concede that there is an honest equilibrium where the first player tells the truth, and the second player concedes (or, in simultaneous play, both players tell the truth and then concede). However, it does seem to be an extremely weak equilibrium—the second player is equally happy to lie, starting a back-and-forth chain which is a tie in expectation.
It seems plausible to me that there’s an incremental zero-sum scoring rule; EG, every convincing counterargument takes 1 point from the other player, so any dishonest statement is sure to lose you a point (in equilibrium). The hope would be that you always prefer to concede rather than argue, even if you’re already losing, in order to avoid losing more points.
However, this doesn’t work, because a dishonest (but convincing) argument gives you +1, and then −1 if it is refuted; so at worst it’s a wash. So again it’s a weak equilibrium, and if there’s any imperfection in the equilibrium at all, it actively incentivises lying when you would otherwise concede (because you want to take the chance that the opponent will not manage to refute your argument).
This was the line of reasoning which led me to the scoring rule in the post, since making it a −2 (but still only +1 for the other player) solves that issue.
I agree that if we assume honesty eventually wins if arguments are long enough (IE, eventually you get to an honest argument which has no dishonest defeater), then there would be an honest equilibrium, and no dishonest equilibrium.
Ahhh, this is actually a pretty interesting point, because it almost suggests that honesty is an Evolutionarily Stable Equilibrium, even though it’s only a Weak Nash Equilibrium. But I think that’s not quite true, since the strategy “lie when you would otherwise have to concede, but otherwise be honest” can invade the honest equilibrium. (IE that mutation would not be selected against, and could be actively selected for if we’re not quite in equilibrium, since players might not be quite perfect at finding the honest refutations for all lies.)
You’re right, that’s really bad. The probability of the opponent finding (and using) a dishonest defeater HAS TO be below 50%, in all cases, which is a pretty high bar. Although of course we can make an argument about how that probability should be below 50% if we’re already in an honest-enough regime. (IE we hope that the dishonest player prefers to concede at that point rather than refute the refutation, for the same reason as your argument gives—it’s too afraid of the triple refutation. This is precisely the argument we can’t make in the zero sum case.)
Whoops, I seem to have missed this comment, sorry about that. I think at this point we’re nearly at agreement.
Yeah, I agree this is possible. (The reason to not expect dishonesty is that sometimes you’ll see honest arguments to which there is no dishonest defeater.)
Similar comment here—the more you expect that honest claims will likely have dishonest defeaters, the weaker you expect the equilibrium to be. (E.g. it’s clearly not a tie when honest claims never have dishonest defeaters; in this case first player always wins.)
On the specific −2/+1 proposal, the issue is that then the first player just makes some dishonest argument, and the second player concedes because even if they give an honest defeater, the second player could then re-defeat that with a dishonest defeater. (I realize I’m just repeating myself here; there’s more discussion in the next section.)
But more broadly, I claim that given your assumptions there is no possible scoring rule that (in the worst case) makes honesty a unique equilibrium. This worst case is when every argument has a defeater (and in particular, every honest argument has a dishonest defeater).
In this situation, there is no possible way to distinguish between honesty and dishonesty—under your assumptions, the thing that characterizes honesty is that honest arguments (at least sometimes) don’t have defeaters. From the perspective of the players, the salient feature of the game is that they can make statements; all such statements will have defeaters; there’s no information available to them in the structure of the game that distinguishes honesty from dishonesty. Therefore honesty can’t be the unique equilibrium; whatever the policy is, there should be an equivalent one that is at least sometimes dishonest.
In this worst case, I suspect that for any judge-based scoring rule, the equilibrium behavior is either “the first player says something and the second concedes”, or “every player always provides some arbitrary defeater of the previous statement, and the debate never ends / the debate goes to whoever got the last word”.
Sorry, I don’t get this. How could we make the argument that the probability is below 50%?
Depending on the answer, I expect I’d follow up with either
Why can’t the same argument apply in the zero sum case? or
Why can’t the same argument be used to say that the first player is happy to make a dishonest claim? or
Why is it okay for us to assume that we’re in an honest-enough regime?
Separately, I’d also want to understand how exactly we’re evading the argument I gave above about how the players can’t even distinguish between honesty and dishonesty in the worst case.
----
Things I explicitly agree with:
and
I think my analysis there was not particularly good, and only starts to make sense if we aren’t yet in equilibrium.
I think #3 is the most reasonable, with the answer being “I have no reason why that’s a reasonable assumption; I’m just saying, that’s what you’d usually try to argue in a debate context...”
(As I stated in the OP, I have no claims as to how to induce honest equilibrium in my setup.)
I agree that we are now largely in agreement about this branch of the discussion.
My (admittedly conservative) supposition is that every claim does have a defeater which could be found by a sufficiently intelligent adversary, but, the difficulty of finding such claims can be much higher than finding honest ones.
Yep, makes sense. So nothing distinguishes between an honest equilibrium and a dishonest one, for sufficiently smart players.
There is still potentially room for guarantees/arguments about reaching honest equilibria (in the worst case) based on the training procedure, due to the idea that the honest defeaters are easier to find.