Thanks! :-) But I still don’t understand what made you express the payoff as a function of p. Was it just something you thought of when applying UDT (perhaps after knowing that’s how someone else approached the problem), or is there something about UDT that required you to do that?
What do you mean? p is the only control parameter… You consider a set of “global” mixed strategies, indexed by p, and pick one that leads to the best outcome, without worrying about where your mind that does this calculation is currently located and under what conditions you are thinking this thought.
What do you mean? p is the only control parameter…
Perhaps, but it’s an innovation to think of the problem in terms of “solving for the random fraction of times I’m going to do them”. That is, even considering that you should add randomness in between your decision and what you do, is an insight. What focused your attention on optimizing with respect to p?
Mixed strategy is a standard concept, so here we are considering a set S of all (global) mixed strategies available for the game. When you are searching for the best strategy, you are maximizing the payoff over S. You are searching for the mixed strategy that gives the best payoff. What UDT tells is that you should just do that, even if you are considering what to do in a situation where some of the options have run out, and, as here, even if you have no idea where you are. “The best strategy” quite literally means
s∗=argmaxs∈SEU(s)
The only parameter for a given strategy is the probability of turning, so it’s natural to index the strategies by that probability. This indexing is a mapping t:[0,1]->S that places a mixed strategy in correspondence with a value of turning probability. Now, we can rewrite the expected utility maximization in terms of probability:
Okay, that’s making more sense—the part where you get to parameterizing p as a real is what I was interested in.
But do you do the same thing when applying UDT to Newcomb’s problem? Do you consider it a necessary part of UDT that you take p (with 0<=p<=1) as a continuous parameter to maximize over, where p is the probability of one-boxing?
Fundamentally, this depends on the setting—you might not be given a random number generator (randomness is defined with respect to the game), and so the strategies that depend on a random value won’t be available in the set of strategies to choose from. In Newcomb’s problem, the usual setting is that you have to be fairly deterministic or Omega punishes you (so that a small probability of two-boxing may even be preferable to pure one-boxing, or not, depending on Omega’s strategy), or Omega may be placed so that your strategy is always deterministic for it (effectively, taking mixed strategies out of the set of allowed ones).
S() is suppose to be an implementation of UDT. By looking at the world program P, it should determine that among all possible input-output mappings, those that return “EXIT” for 1⁄3 of all inputs (doesn’t matter which ones) maximize average payoff. What made me express the payoff as a function of p is by stepping through what S is supposed to do as an implementation of UDT.
I’m still confused. Your response seems to just say, “I did it because it works.”—which is a great reason! But I want to know if UDT gave you more guidance than that.
Does UDT require that you look at the consequences of doing something p% of the time (irrespective of which ones), on all problems?
Basically, I’m in the position of that guy/gal that everyone here probably helped out in high school:
“How do you do the proof in problem 29?”
“Oh, just used identities 3 and 5, solve for t, and plug it back into the original equation.”
“But how did you know to do that?”
Does UDT require that you look at the consequences of doing something p% of the time (irrespective of which ones), on all problems?
No, UDT (at least in my formulation) requires that you look at all possible input-output mappings, and choose the one that is optimal. In this case it so happens that any function that returns “EXIT” for 1⁄3 of inputs is optimal.
Thanks! :-) But I still don’t understand what made you express the payoff as a function of p. Was it just something you thought of when applying UDT (perhaps after knowing that’s how someone else approached the problem), or is there something about UDT that required you to do that?
What do you mean? p is the only control parameter… You consider a set of “global” mixed strategies, indexed by p, and pick one that leads to the best outcome, without worrying about where your mind that does this calculation is currently located and under what conditions you are thinking this thought.
Perhaps, but it’s an innovation to think of the problem in terms of “solving for the random fraction of times I’m going to do them”. That is, even considering that you should add randomness in between your decision and what you do, is an insight. What focused your attention on optimizing with respect to p?
Mixed strategy is a standard concept, so here we are considering a set S of all (global) mixed strategies available for the game. When you are searching for the best strategy, you are maximizing the payoff over S. You are searching for the mixed strategy that gives the best payoff. What UDT tells is that you should just do that, even if you are considering what to do in a situation where some of the options have run out, and, as here, even if you have no idea where you are. “The best strategy” quite literally means
s∗=argmaxs∈SEU(s )
The only parameter for a given strategy is the probability of turning, so it’s natural to index the strategies by that probability. This indexing is a mapping t:[0,1]->S that places a mixed strategy in correspondence with a value of turning probability. Now, we can rewrite the expected utility maximization in terms of probability:
s∗=t(p∗ ,\%20p^*=\arg\max_{p\in%20[0,1]}%20EU(t(p)))
For a strategy corresponding to turning probability p, it’s easy to express corresponding expected utility:
EU(t(p )%20=%20(1-p)\cdot%200%20+%20p\cdot%20((1-p)\cdot%204%20+%20p\cdot%201))%20=p^2+4p(1-p))
We now can find the optimal strategy as
s∗=t(p∗ ,\%20p^*=\arg\max_{p\in%20[0,1]}(p^2+4p(1-p)))
Okay, that’s making more sense—the part where you get to parameterizing p as a real is what I was interested in.
But do you do the same thing when applying UDT to Newcomb’s problem? Do you consider it a necessary part of UDT that you take p (with 0<=p<=1) as a continuous parameter to maximize over, where p is the probability of one-boxing?
Fundamentally, this depends on the setting—you might not be given a random number generator (randomness is defined with respect to the game), and so the strategies that depend on a random value won’t be available in the set of strategies to choose from. In Newcomb’s problem, the usual setting is that you have to be fairly deterministic or Omega punishes you (so that a small probability of two-boxing may even be preferable to pure one-boxing, or not, depending on Omega’s strategy), or Omega may be placed so that your strategy is always deterministic for it (effectively, taking mixed strategies out of the set of allowed ones).
S() is suppose to be an implementation of UDT. By looking at the world program P, it should determine that among all possible input-output mappings, those that return “EXIT” for 1⁄3 of all inputs (doesn’t matter which ones) maximize average payoff. What made me express the payoff as a function of p is by stepping through what S is supposed to do as an implementation of UDT.
Does that make sense?
I’m still confused. Your response seems to just say, “I did it because it works.”—which is a great reason! But I want to know if UDT gave you more guidance than that.
Does UDT require that you look at the consequences of doing something p% of the time (irrespective of which ones), on all problems?
Basically, I’m in the position of that guy/gal that everyone here probably helped out in high school:
“How do you do the proof in problem 29?” “Oh, just used identities 3 and 5, solve for t, and plug it back into the original equation.” “But how did you know to do that?”
No, UDT (at least in my formulation) requires that you look at all possible input-output mappings, and choose the one that is optimal. In this case it so happens that any function that returns “EXIT” for 1⁄3 of inputs is optimal.