As far as I understand the main thing that is missing is a solid theory of logical counterfactuals. The main question is: In the counter-factual scenario in which TDT recommends action X to agent A , what does would another agent B do? How does the thought process of A correlate with the thought process of B?
There are some games mentioned in the FDT and TDT paper which clearly involve multiple TDT agents. The FDT paper mentions that TDTs “form successful voting coalitions in elections”, and the TDT paper mentions that TDTs cooperate in Prisoner’s Dilemma. In those games we can easily tell how the thought process of one agent correlates with that of other agents, because in those games there is an obvious symmetry between all agents, so that all agents will always do the same thing.
In a blackmail scenario it’s not so obvious, but I do think there is a certain symmetry between rejecting all blackmail and sending all blackmail. If the blackmailer B writes a letter saying “Hand over your utility point or we will both get -∞ utility” and thinks about whether to send it or not, and then the victim A comes along and says “I reject all blackmail. This means: Do not send your letter, or we will both get -∞ utility”, then B has been blackmailed by A. So there is a symmetry here. If A rejects all blackmail regardless of causal consequences, then B will also reject A’s “”“blackmail””” that tries to exhort him into not sending his letter, and send his letter to A anyway regardless of causal consequences.
So I no longer believe the claim that TDT agents simply avoid all negative-sum trades.
-”The main question is: In the counter-factual scenario in which TDT recommends action X to agent A , what does would another agent B do?”
This is actually not the main issue. If you fix an algorithm X for agent A to use, then the question “what would agent B do if he is using TDT and knows that agent A is using algorithm X?” has a well-defined answer, say f(X). The question “what would agent A do if she knows that whatever algorithm X she uses, agent B will use counter-algorithm f(X)” then also has a well-defined answer, say Z. So you could define “the result of TDT agents A and B playing against each other” to be where A plays Z and B plays f(Z). The problem is that this setup is not symmetric, and would yield a different result if we switched the order of A and B.
-”In a blackmail scenario it’s not so obvious, but I do think there is a certain symmetry between rejecting all blackmail and sending all blackmail.”
The symmetry argument only works when you have exact symmetry, though. To recall, the argument is that by controlling the output of the TDT algorithm in player A’s position, you are also by logical necessity controlling the output in player B’s position, hence TDT can act as though it controls player B’s action. If there is even the slighest difference between player A and player B then there is no logical necessity and the argument doesn’t work. For example, in a prisoner’s dilemma where the payoffs are not quite symmetric, TDT says nothing.
-”So I no longer believe the claim that TDT agents simply avoid all negative-sum trades.”
I agree with you, but I think that’s because TDT is actually undefined in scenarios where negative-sum trading might occur.
As far as I understand the main thing that is missing is a solid theory of logical counterfactuals.
The main question is: In the counter-factual scenario in which TDT recommends action X to agent A , what does would another agent B do?
How does the thought process of A correlate with the thought process of B?
There are some games mentioned in the FDT and TDT paper which clearly involve multiple TDT agents.
The FDT paper mentions that TDTs “form successful voting coalitions in elections”,
and the TDT paper mentions that TDTs cooperate in Prisoner’s Dilemma.
In those games we can easily tell how the thought process of one agent correlates with that of other agents,
because in those games there is an obvious symmetry between all agents, so that all agents will always do the same thing.
In a blackmail scenario it’s not so obvious, but I do think there is a certain symmetry between rejecting all blackmail and sending all blackmail.
If the blackmailer B writes a letter saying “Hand over your utility point or we will both get -∞ utility” and thinks about whether to send it or not,
and then the victim A comes along and says “I reject all blackmail. This means: Do not send your letter, or we will both get -∞ utility”, then B has been blackmailed by A. So there is a symmetry here. If A rejects all blackmail regardless of causal consequences, then B will also reject A’s “”“blackmail””” that tries to exhort him into not sending his letter, and send his letter to A anyway regardless of causal consequences.
So I no longer believe the claim that TDT agents simply avoid all negative-sum trades.
-”The main question is: In the counter-factual scenario in which TDT recommends action X to agent A , what does would another agent B do?”
This is actually not the main issue. If you fix an algorithm X for agent A to use, then the question “what would agent B do if he is using TDT and knows that agent A is using algorithm X?” has a well-defined answer, say f(X). The question “what would agent A do if she knows that whatever algorithm X she uses, agent B will use counter-algorithm f(X)” then also has a well-defined answer, say Z. So you could define “the result of TDT agents A and B playing against each other” to be where A plays Z and B plays f(Z). The problem is that this setup is not symmetric, and would yield a different result if we switched the order of A and B.
-”In a blackmail scenario it’s not so obvious, but I do think there is a certain symmetry between rejecting all blackmail and sending all blackmail.”
The symmetry argument only works when you have exact symmetry, though. To recall, the argument is that by controlling the output of the TDT algorithm in player A’s position, you are also by logical necessity controlling the output in player B’s position, hence TDT can act as though it controls player B’s action. If there is even the slighest difference between player A and player B then there is no logical necessity and the argument doesn’t work. For example, in a prisoner’s dilemma where the payoffs are not quite symmetric, TDT says nothing.
-”So I no longer believe the claim that TDT agents simply avoid all negative-sum trades.”
I agree with you, but I think that’s because TDT is actually undefined in scenarios where negative-sum trading might occur.