Could you elaborate more on why you would describe policy selection as “messiness and compromise?”
One point I don’t think I see enough is that policy selection falls naturally out of Savage’s theorem. Sure, Savage’s theorem has a set that we verbally call the set of “Acts,” but anything that obeys the desiderata obeys the theorem! To play the role of an “Act,” something just has to be a function from the things we don’t control to the set of things we care about. Strategies are functions from the things we don’t control to the things we care about, therefore by Savage’s theorem rational agents (in Savage’s sense) will act as if they’re picking the one with highest expected value.
Of course that’s a very non-technical way of putting it. In the example of the Absent-minded Driver problem, the problem with stochastic act-based agents is that their preference ordering on individual turns doesn’t obey the ordinal independence postulate, which requires that if one act is better than another (like always going straight is better than always turning), it should still be better if in some subset of world-states (e.g. the first intersection, if we’re trying to use the set of intersections as the set of states) you did some other act (i.e. always going straight) no matter what—that is, the goodness of applying one “action function” to the “external state” objects in Savage’s theorem can’t depend on what would happen if the state were different. So in the Absent-minded Driver problem, picking the maximum-EV strategy obeys Savage’s theorem, but trying to pick the the maximum-EV stochastic action at each intersection does not obey Savage’s theorem, because the sets of things you’re trying to plug into the theorem don’t obey the desiderata.
When we consider logical uncertainty (at least, as formalized by logical induction), policy selection is a messy compromise because it’s not clear how much thinking one should do before using what you know so far to select a policy. There’s no apparent line in the sand between “too updateless” and “not updateless enough”.
Hm. I basically always think of logical counterfactual mugging through the lens of ordinary counterfactual mugging.
Suppose Omega says “I would have given you $100 if the second prime number was 4, but only if you would have given me $1 otherwise. Can I have a dollar?” UDT tells you whether or not to pay by constructing some stochastic game that the agent is imagined to be playing, and then following a good strategy for that game. But I don’t think that it’s right to construct this game by using our own logical uncertainty about the second prime number, at some artificial state of logical ignorance. Instead, I think a UDT agent should construct this game by trying to model how Omega chose which mathematical fact to condition on in the first place, using its real ignorance about Omega’s initial conditions. (Or not quite its real ignorance—I’m still confused about this part of UDT. Clearly you have to amputate the knowledge that the initial conditions you assign to Omega will lead to it asking you that specific question, but how does this amputation process work precisely? Do the results change if you keep rewinding farther than this, eventually trying to deduce things from the initial conditions of the universe, and if so, how far should you rewind for best results?)
I don’t think using the state of ignorance which Omega used to decide is right. The main purpose of UDT is to get reflective consistency. It is the “trivial solution” to the problem of reflective consistency if you choose a policy from the agent’s own prior: the agent has no reason to change from UDT, since the policy is already producing the global behavior it would want to change to.
What’s the motivation to use Omega’s beliefs? From what perspective can we justify that? Yes, doing so causes you to be counterfactually mugged in all cases; but, some counterfactual muggings are just bad deals. You don’t want to give Omega $10 in the real world for the sake of $1M in an a priori incredibly unlikely world, just because Omega thought the other world likely. Do you?
EDIT:
Maybe I’m wrong.
If Omega imagines you in the incredibly unlikely world, with an altered prior to think that world is likely and the real world is incredibly unlikely, you’d like for that version of yourself to give the $10. But, this seems like an unfair problem. You can’t generally do well under conditions where Omega arbitrarily edits your brain. If you see this coming you’d like to edit yourself to a DT which does well here, but if you see Omega threatens everything you hold dear and ask you to edit yourself to agent X, you’d edit yourself to agent X (unless X would then be an even worse threat than Omega). Not every case where you want to edit yourself constitutes a reflective inconsistency.
I’m confused about what condition really differentiates this case from regular counterfactual mugging, though. One possible condition is: you are reflectively inconsistent if you regret the decisions made by forseeable changes in your epistemic state which result from normal operation. If Omega copies you and presents the copy with evidence you yourself currently expect you might see, fine. If Omega copies you and puts a totally different prior in, it’s unfair. An AI might self-edit if it sees such a thing coming, but we aren’t trying to prevent all self-modifications; the AI should self-modify under certain kinds of external incentives.
On the other hand, maybe there’s an interesting DT which bargains with its mind-edited copy based on uncertainty about whether Omega is going to approach asking for $10 in this universe or whether this is a universe with a conditional $1M (before we even know what procedure Omega will implement).
Yeah, actually, that sounds quite likely. In the universe with the conditional $1M, the mind-edited you has control over whether you get the money. In the universe where Omega asks you for $10, you have the same power over the mind-edited copy (the copy thinks—even though you don’t think the copy even exists if Omega asks you for $10). So, this is like cooperating in Prisoner’s dilemma.
In other words, it seems like a DT will act more or less as you suggest if it is updateless and has good bargaining properties with agents whose DT is similar. This line of reasoning still speaks against giving Omega the $10 in the case where your prior gives high probability to Omega asking for the $10 and low probability for the other side—but this makes sense; an Omega like that is a cheapskate not worth dealing with.
Hmm, yeah, I think I was wrong. You probably shouldn’t pay for “the second prime is 4,” unless Omega is doing some really dramatic brain-editing in its counterfactual simulation. This highlights where the center of the problem is—the problem isn’t designing the agent, the problem is understanding, in detail, an Omega that evaluates logical counterfactuals.
But, and this is what tripped me up, Omega’s beliefs are also important. Consider the case where Omega is running a search process for complicated false statements. Now you definitely shouldn’t pay Omega, even if you’re a priori uncertain about the statements Omega mugs you with.
This seems to get into problems with what the decision problem actually is. If you’re “updateless” but you already know you’re in this particular counterfactual mugging, you may respond much differently than if you’re merely considering this as one of many possible muggings. Specifically, in the first case, any Omega is a cheapskate Omega if it chooses something like “P and not P”. In the latter case, however, we might know that Omega arrives at absurdities such as this through a fair process which is equally likely to select true sentences. In that case, despite not being updateless enough to be ignorant about “P and not P”, we might go along with the mogging as part of a policy which achieves high payout on average.
Could you elaborate more on why you would describe policy selection as “messiness and compromise?”
One point I don’t think I see enough is that policy selection falls naturally out of Savage’s theorem. Sure, Savage’s theorem has a set that we verbally call the set of “Acts,” but anything that obeys the desiderata obeys the theorem! To play the role of an “Act,” something just has to be a function from the things we don’t control to the set of things we care about. Strategies are functions from the things we don’t control to the things we care about, therefore by Savage’s theorem rational agents (in Savage’s sense) will act as if they’re picking the one with highest expected value.
Of course that’s a very non-technical way of putting it. In the example of the Absent-minded Driver problem, the problem with stochastic act-based agents is that their preference ordering on individual turns doesn’t obey the ordinal independence postulate, which requires that if one act is better than another (like always going straight is better than always turning), it should still be better if in some subset of world-states (e.g. the first intersection, if we’re trying to use the set of intersections as the set of states) you did some other act (i.e. always going straight) no matter what—that is, the goodness of applying one “action function” to the “external state” objects in Savage’s theorem can’t depend on what would happen if the state were different. So in the Absent-minded Driver problem, picking the maximum-EV strategy obeys Savage’s theorem, but trying to pick the the maximum-EV stochastic action at each intersection does not obey Savage’s theorem, because the sets of things you’re trying to plug into the theorem don’t obey the desiderata.
When we consider logical uncertainty (at least, as formalized by logical induction), policy selection is a messy compromise because it’s not clear how much thinking one should do before using what you know so far to select a policy. There’s no apparent line in the sand between “too updateless” and “not updateless enough”.
I mentioned the example of counterfactual mugging on prime-number factorization. How many heuristics about primality should be considered?
Hm. I basically always think of logical counterfactual mugging through the lens of ordinary counterfactual mugging.
Suppose Omega says “I would have given you $100 if the second prime number was 4, but only if you would have given me $1 otherwise. Can I have a dollar?” UDT tells you whether or not to pay by constructing some stochastic game that the agent is imagined to be playing, and then following a good strategy for that game. But I don’t think that it’s right to construct this game by using our own logical uncertainty about the second prime number, at some artificial state of logical ignorance. Instead, I think a UDT agent should construct this game by trying to model how Omega chose which mathematical fact to condition on in the first place, using its real ignorance about Omega’s initial conditions. (Or not quite its real ignorance—I’m still confused about this part of UDT. Clearly you have to amputate the knowledge that the initial conditions you assign to Omega will lead to it asking you that specific question, but how does this amputation process work precisely? Do the results change if you keep rewinding farther than this, eventually trying to deduce things from the initial conditions of the universe, and if so, how far should you rewind for best results?)
I don’t think using the state of ignorance which Omega used to decide is right. The main purpose of UDT is to get reflective consistency. It is the “trivial solution” to the problem of reflective consistency if you choose a policy from the agent’s own prior: the agent has no reason to change from UDT, since the policy is already producing the global behavior it would want to change to.
What’s the motivation to use Omega’s beliefs? From what perspective can we justify that? Yes, doing so causes you to be counterfactually mugged in all cases; but, some counterfactual muggings are just bad deals. You don’t want to give Omega $10 in the real world for the sake of $1M in an a priori incredibly unlikely world, just because Omega thought the other world likely. Do you?
EDIT:
Maybe I’m wrong.
If Omega imagines you in the incredibly unlikely world, with an altered prior to think that world is likely and the real world is incredibly unlikely, you’d like for that version of yourself to give the $10. But, this seems like an unfair problem. You can’t generally do well under conditions where Omega arbitrarily edits your brain. If you see this coming you’d like to edit yourself to a DT which does well here, but if you see Omega threatens everything you hold dear and ask you to edit yourself to agent X, you’d edit yourself to agent X (unless X would then be an even worse threat than Omega). Not every case where you want to edit yourself constitutes a reflective inconsistency.
I’m confused about what condition really differentiates this case from regular counterfactual mugging, though. One possible condition is: you are reflectively inconsistent if you regret the decisions made by forseeable changes in your epistemic state which result from normal operation. If Omega copies you and presents the copy with evidence you yourself currently expect you might see, fine. If Omega copies you and puts a totally different prior in, it’s unfair. An AI might self-edit if it sees such a thing coming, but we aren’t trying to prevent all self-modifications; the AI should self-modify under certain kinds of external incentives.
On the other hand, maybe there’s an interesting DT which bargains with its mind-edited copy based on uncertainty about whether Omega is going to approach asking for $10 in this universe or whether this is a universe with a conditional $1M (before we even know what procedure Omega will implement).
Yeah, actually, that sounds quite likely. In the universe with the conditional $1M, the mind-edited you has control over whether you get the money. In the universe where Omega asks you for $10, you have the same power over the mind-edited copy (the copy thinks—even though you don’t think the copy even exists if Omega asks you for $10). So, this is like cooperating in Prisoner’s dilemma.
In other words, it seems like a DT will act more or less as you suggest if it is updateless and has good bargaining properties with agents whose DT is similar. This line of reasoning still speaks against giving Omega the $10 in the case where your prior gives high probability to Omega asking for the $10 and low probability for the other side—but this makes sense; an Omega like that is a cheapskate not worth dealing with.
Hmm, yeah, I think I was wrong. You probably shouldn’t pay for “the second prime is 4,” unless Omega is doing some really dramatic brain-editing in its counterfactual simulation. This highlights where the center of the problem is—the problem isn’t designing the agent, the problem is understanding, in detail, an Omega that evaluates logical counterfactuals.
But, and this is what tripped me up, Omega’s beliefs are also important. Consider the case where Omega is running a search process for complicated false statements. Now you definitely shouldn’t pay Omega, even if you’re a priori uncertain about the statements Omega mugs you with.
This seems to get into problems with what the decision problem actually is. If you’re “updateless” but you already know you’re in this particular counterfactual mugging, you may respond much differently than if you’re merely considering this as one of many possible muggings. Specifically, in the first case, any Omega is a cheapskate Omega if it chooses something like “P and not P”. In the latter case, however, we might know that Omega arrives at absurdities such as this through a fair process which is equally likely to select true sentences. In that case, despite not being updateless enough to be ignorant about “P and not P”, we might go along with the mogging as part of a policy which achieves high payout on average.