So, if an agent hears of your pre-commitment, then that agent merely needs to ensure that you don’t hear that it has heard of your pre-commitment in order to be able to blackmail you?
If you’re uncertain about whether or not your blackmailer has heard of your pre-commitment, then you should act as if they have, and ignore their blackmail accordingly. This also applies to agents who have deleted knowledge of your pre-commitment from their memories; you want to punish agents who spend time trying to think up loopholes in your pre-commitment, not reward them. The harder part, of course, is determining what threshold of uncertainty is required; to this I freely admit that I don’t know the answer.
EDIT: More generally, it seems that this is an instance of a broader problem: namely, the problem of obtaining information. Given perfect information, the decision theory works out, but by disallowing my agent access to certain key pieces of information regarding the blackmailer, you can force a sub-optimal outcome. Moreover, this seems to be true for any strategy that depends on your opponent’s epistemic state; you can always force that strategy to fail by denying it the information it needs. The only strategies immune to this seem to be the extremely general ones (like “Defect in one-shot Prisoner’s Dilemmas”), but those are guaranteed to produce a sub-optimal result in a number of cases (if you’re playing against a TDT/UDT-like agent, for example).
If you’re uncertain about whether or not your blackmailer has heard of your pre-commitment, then you should act as if they have, and ignore their blackmail accordingly. This also applies to agents who have deleted knowledge of your pre-commitment from their memories; you want to punish agents who spend time trying to think up loopholes in your pre-commitment, not reward them. The harder part, of course, is determining what threshold of uncertainty is required; to this I freely admit that I don’t know the answer.
Hmmm. If an agent can work out what threshold of uncertainty you have decided on, and then engineer a situation where you think it it less likely than that threshold that the agent has heard of your pre-commitment, then your strategy will fail.
So, even if you do find a way to calculate the ideal threshold, then it will fail against an agent smart enough to repeat that calculation; unless, of course, you simply assume that all possible agents have necessarily heard of your pre-commitment (since an agent cannot engineer a less than 0% chance of failing to hear of your pre-commitment). This, however, causes the strategy to simplify to “always reject blackmail, whether or not the agent has heard of your pre-commitment”.
Alternatively, you can ensure that any agent able to capture you in a simulation must also know of your pre-commitment; for example, by having it tattooed on yourself somewhere (thus, any agent which rebuilds a simulation of your body must include the tattoo, and therefore must know of the pre-commitment).
If you make me play the Iterated Prisoner’s Dilemma with shared source code, I can come up with a provably optimal solution against whatever opponent I’m playing against
If you’re uncertain about whether or not your blackmailer has heard of your pre-commitment, then you should act as if they have, and ignore their blackmail accordingly. This also applies to agents who have deleted knowledge of your pre-commitment from their memories; you want to punish agents who spend time trying to think up loopholes in your pre-commitment, not reward them. The harder part, of course, is determining what threshold of uncertainty is required; to this I freely admit that I don’t know the answer.
EDIT: More generally, it seems that this is an instance of a broader problem: namely, the problem of obtaining information. Given perfect information, the decision theory works out, but by disallowing my agent access to certain key pieces of information regarding the blackmailer, you can force a sub-optimal outcome. Moreover, this seems to be true for any strategy that depends on your opponent’s epistemic state; you can always force that strategy to fail by denying it the information it needs. The only strategies immune to this seem to be the extremely general ones (like “Defect in one-shot Prisoner’s Dilemmas”), but those are guaranteed to produce a sub-optimal result in a number of cases (if you’re playing against a TDT/UDT-like agent, for example).
Hmmm. If an agent can work out what threshold of uncertainty you have decided on, and then engineer a situation where you think it it less likely than that threshold that the agent has heard of your pre-commitment, then your strategy will fail.
So, even if you do find a way to calculate the ideal threshold, then it will fail against an agent smart enough to repeat that calculation; unless, of course, you simply assume that all possible agents have necessarily heard of your pre-commitment (since an agent cannot engineer a less than 0% chance of failing to hear of your pre-commitment). This, however, causes the strategy to simplify to “always reject blackmail, whether or not the agent has heard of your pre-commitment”.
Alternatively, you can ensure that any agent able to capture you in a simulation must also know of your pre-commitment; for example, by having it tattooed on yourself somewhere (thus, any agent which rebuilds a simulation of your body must include the tattoo, and therefore must know of the pre-commitment).
Doesn’t that implicate the halting problem?
Argh, you ninja’d my edit. I have now removed that part of my comment (since it seemed somewhat irrelevant to my main point).