Hm. I basically always think of logical counterfactual mugging through the lens of ordinary counterfactual mugging.
Suppose Omega says “I would have given you $100 if the second prime number was 4, but only if you would have given me $1 otherwise. Can I have a dollar?” UDT tells you whether or not to pay by constructing some stochastic game that the agent is imagined to be playing, and then following a good strategy for that game. But I don’t think that it’s right to construct this game by using our own logical uncertainty about the second prime number, at some artificial state of logical ignorance. Instead, I think a UDT agent should construct this game by trying to model how Omega chose which mathematical fact to condition on in the first place, using its real ignorance about Omega’s initial conditions. (Or not quite its real ignorance—I’m still confused about this part of UDT. Clearly you have to amputate the knowledge that the initial conditions you assign to Omega will lead to it asking you that specific question, but how does this amputation process work precisely? Do the results change if you keep rewinding farther than this, eventually trying to deduce things from the initial conditions of the universe, and if so, how far should you rewind for best results?)
I don’t think using the state of ignorance which Omega used to decide is right. The main purpose of UDT is to get reflective consistency. It is the “trivial solution” to the problem of reflective consistency if you choose a policy from the agent’s own prior: the agent has no reason to change from UDT, since the policy is already producing the global behavior it would want to change to.
What’s the motivation to use Omega’s beliefs? From what perspective can we justify that? Yes, doing so causes you to be counterfactually mugged in all cases; but, some counterfactual muggings are just bad deals. You don’t want to give Omega $10 in the real world for the sake of $1M in an a priori incredibly unlikely world, just because Omega thought the other world likely. Do you?
EDIT:
Maybe I’m wrong.
If Omega imagines you in the incredibly unlikely world, with an altered prior to think that world is likely and the real world is incredibly unlikely, you’d like for that version of yourself to give the $10. But, this seems like an unfair problem. You can’t generally do well under conditions where Omega arbitrarily edits your brain. If you see this coming you’d like to edit yourself to a DT which does well here, but if you see Omega threatens everything you hold dear and ask you to edit yourself to agent X, you’d edit yourself to agent X (unless X would then be an even worse threat than Omega). Not every case where you want to edit yourself constitutes a reflective inconsistency.
I’m confused about what condition really differentiates this case from regular counterfactual mugging, though. One possible condition is: you are reflectively inconsistent if you regret the decisions made by forseeable changes in your epistemic state which result from normal operation. If Omega copies you and presents the copy with evidence you yourself currently expect you might see, fine. If Omega copies you and puts a totally different prior in, it’s unfair. An AI might self-edit if it sees such a thing coming, but we aren’t trying to prevent all self-modifications; the AI should self-modify under certain kinds of external incentives.
On the other hand, maybe there’s an interesting DT which bargains with its mind-edited copy based on uncertainty about whether Omega is going to approach asking for $10 in this universe or whether this is a universe with a conditional $1M (before we even know what procedure Omega will implement).
Yeah, actually, that sounds quite likely. In the universe with the conditional $1M, the mind-edited you has control over whether you get the money. In the universe where Omega asks you for $10, you have the same power over the mind-edited copy (the copy thinks—even though you don’t think the copy even exists if Omega asks you for $10). So, this is like cooperating in Prisoner’s dilemma.
In other words, it seems like a DT will act more or less as you suggest if it is updateless and has good bargaining properties with agents whose DT is similar. This line of reasoning still speaks against giving Omega the $10 in the case where your prior gives high probability to Omega asking for the $10 and low probability for the other side—but this makes sense; an Omega like that is a cheapskate not worth dealing with.
Hmm, yeah, I think I was wrong. You probably shouldn’t pay for “the second prime is 4,” unless Omega is doing some really dramatic brain-editing in its counterfactual simulation. This highlights where the center of the problem is—the problem isn’t designing the agent, the problem is understanding, in detail, an Omega that evaluates logical counterfactuals.
But, and this is what tripped me up, Omega’s beliefs are also important. Consider the case where Omega is running a search process for complicated false statements. Now you definitely shouldn’t pay Omega, even if you’re a priori uncertain about the statements Omega mugs you with.
This seems to get into problems with what the decision problem actually is. If you’re “updateless” but you already know you’re in this particular counterfactual mugging, you may respond much differently than if you’re merely considering this as one of many possible muggings. Specifically, in the first case, any Omega is a cheapskate Omega if it chooses something like “P and not P”. In the latter case, however, we might know that Omega arrives at absurdities such as this through a fair process which is equally likely to select true sentences. In that case, despite not being updateless enough to be ignorant about “P and not P”, we might go along with the mogging as part of a policy which achieves high payout on average.
Hm. I basically always think of logical counterfactual mugging through the lens of ordinary counterfactual mugging.
Suppose Omega says “I would have given you $100 if the second prime number was 4, but only if you would have given me $1 otherwise. Can I have a dollar?” UDT tells you whether or not to pay by constructing some stochastic game that the agent is imagined to be playing, and then following a good strategy for that game. But I don’t think that it’s right to construct this game by using our own logical uncertainty about the second prime number, at some artificial state of logical ignorance. Instead, I think a UDT agent should construct this game by trying to model how Omega chose which mathematical fact to condition on in the first place, using its real ignorance about Omega’s initial conditions. (Or not quite its real ignorance—I’m still confused about this part of UDT. Clearly you have to amputate the knowledge that the initial conditions you assign to Omega will lead to it asking you that specific question, but how does this amputation process work precisely? Do the results change if you keep rewinding farther than this, eventually trying to deduce things from the initial conditions of the universe, and if so, how far should you rewind for best results?)
I don’t think using the state of ignorance which Omega used to decide is right. The main purpose of UDT is to get reflective consistency. It is the “trivial solution” to the problem of reflective consistency if you choose a policy from the agent’s own prior: the agent has no reason to change from UDT, since the policy is already producing the global behavior it would want to change to.
What’s the motivation to use Omega’s beliefs? From what perspective can we justify that? Yes, doing so causes you to be counterfactually mugged in all cases; but, some counterfactual muggings are just bad deals. You don’t want to give Omega $10 in the real world for the sake of $1M in an a priori incredibly unlikely world, just because Omega thought the other world likely. Do you?
EDIT:
Maybe I’m wrong.
If Omega imagines you in the incredibly unlikely world, with an altered prior to think that world is likely and the real world is incredibly unlikely, you’d like for that version of yourself to give the $10. But, this seems like an unfair problem. You can’t generally do well under conditions where Omega arbitrarily edits your brain. If you see this coming you’d like to edit yourself to a DT which does well here, but if you see Omega threatens everything you hold dear and ask you to edit yourself to agent X, you’d edit yourself to agent X (unless X would then be an even worse threat than Omega). Not every case where you want to edit yourself constitutes a reflective inconsistency.
I’m confused about what condition really differentiates this case from regular counterfactual mugging, though. One possible condition is: you are reflectively inconsistent if you regret the decisions made by forseeable changes in your epistemic state which result from normal operation. If Omega copies you and presents the copy with evidence you yourself currently expect you might see, fine. If Omega copies you and puts a totally different prior in, it’s unfair. An AI might self-edit if it sees such a thing coming, but we aren’t trying to prevent all self-modifications; the AI should self-modify under certain kinds of external incentives.
On the other hand, maybe there’s an interesting DT which bargains with its mind-edited copy based on uncertainty about whether Omega is going to approach asking for $10 in this universe or whether this is a universe with a conditional $1M (before we even know what procedure Omega will implement).
Yeah, actually, that sounds quite likely. In the universe with the conditional $1M, the mind-edited you has control over whether you get the money. In the universe where Omega asks you for $10, you have the same power over the mind-edited copy (the copy thinks—even though you don’t think the copy even exists if Omega asks you for $10). So, this is like cooperating in Prisoner’s dilemma.
In other words, it seems like a DT will act more or less as you suggest if it is updateless and has good bargaining properties with agents whose DT is similar. This line of reasoning still speaks against giving Omega the $10 in the case where your prior gives high probability to Omega asking for the $10 and low probability for the other side—but this makes sense; an Omega like that is a cheapskate not worth dealing with.
Hmm, yeah, I think I was wrong. You probably shouldn’t pay for “the second prime is 4,” unless Omega is doing some really dramatic brain-editing in its counterfactual simulation. This highlights where the center of the problem is—the problem isn’t designing the agent, the problem is understanding, in detail, an Omega that evaluates logical counterfactuals.
But, and this is what tripped me up, Omega’s beliefs are also important. Consider the case where Omega is running a search process for complicated false statements. Now you definitely shouldn’t pay Omega, even if you’re a priori uncertain about the statements Omega mugs you with.
This seems to get into problems with what the decision problem actually is. If you’re “updateless” but you already know you’re in this particular counterfactual mugging, you may respond much differently than if you’re merely considering this as one of many possible muggings. Specifically, in the first case, any Omega is a cheapskate Omega if it chooses something like “P and not P”. In the latter case, however, we might know that Omega arrives at absurdities such as this through a fair process which is equally likely to select true sentences. In that case, despite not being updateless enough to be ignorant about “P and not P”, we might go along with the mogging as part of a policy which achieves high payout on average.