Here’s an argument I made in a chat with Wei. (The problem is equivalent to Counterfactual Mugging with a logical coin, so I talk about that instead.)
1) A good decision theory should always do what it would have precommitted to doing.
2) Precommitment can be modeled as a decision problem where an AI is asked to write a successor AI.
3) Imagine the AI is asked to write a program P that will be faced with Counterfactual Mugging with a logical coin (e.g. parity of the millionth digit of pi). The resulting utility goes to the AI. The AI writing P doesn’t have enough resources to compute the coin’s outcome, but P is allowed to use as much resources as needed.
4) Writing P is equivalent to supplying only one bit: should P pay up if asked?
5) Supplying that bit is equivalent to accepting or declining the bet “win $10000 if the millionth digit of pi is even, lose $100 if it’s odd”.
6) So if your AI can make bets about the digits of pi (which means it represents logical uncertainty as probabilities), it should also pay up in Counterfactual Mugging with a logical coin, even if it already has enough resources to compute the coin’s outcome. The AI’s initlal state of logical uncertainty should be “frozen” into its utility function, just like all other kinds of uncertainty (the U in UDT means “updateless”).
Maybe this argument only shows that representing logical uncertainty as probabilities is weird. Everyone is welcome to try and figure out a better way :-)
1) A good decision theory should always do what it would have precommitted to doing.
It’s dangerous to phrase it this way, since coordination (which is what really happens) allows using more knowledge than was available at the time of a possible precommitment, as I described here.
4) Writing P is equivalent to supplying only one bit: should P pay up if asked?
Not if the correct decision depends on an abstract fact that you can’t access, but can reference. In that case, P should implement a strategy of acting depending on the value of that fact (computing and observing that value to feed to the strategy). That is, abstract facts that will only be accessible in the future play the same role as observations that will only be accessible in the future, and a strategy can be written conditionally on either.
The difference between abstract facts and observations however is that observations may tell you where you are, without telling you what exists and what doesn’t (both counterfactuals exist and have equal value, you’re in one of them), while abstract facts can tell you what exists and what doesn’t (the other logical counterfactual doesn’t exist and has zero value).
4) Writing P is equivalent to supplying only one bit: should P pay up if asked?
Not if the correct decision depends on an abstract fact that you can’t access, but can reference.
In general, the distinction is important. But, for this puzzle, the proposition “asked” is equivalent to the relevant “abstract fact”. The agent is asked iff the millionth digit of pi is odd. So point (4) already provides as much of a conditional strategy as is possible.
It’s assumed that the agent doesn’t know if the digit is odd (and whether it’ll be in the situation described in the post) at this point. The proposal to self-modify is a separate event that precedes the thought experiment.
I see, so there’s indeed just one bit, and it should be “don’t cooperate”.
This is interesting in that UDT likes to ignore epistemic significance of observations, but here we have an observation that implies something about the world, and not just tells where the agent is. How does one reason about strategies if different branches of those strategies tell something about the value of the other branches?..
As Tyrrell points out, it’s not as simple. When you’re considering the strategy of what to do if you’re on the giving side of the counterfactual (“Should P pay up if asked?”), the fact that you’re in that situation already implies all you wanted to know about the digit of pi, so the strategy is not to play conditionally on the digit of pi, but just to either pay up or not, one bit as you said. But the value of the decision on that branch of the strategy follows from the logical implications of being on that branch, which is something new for UDT!
Here’s an argument I made in a chat with Wei. (The problem is equivalent to Counterfactual Mugging with a logical coin, so I talk about that instead.)
1) A good decision theory should always do what it would have precommitted to doing.
2) Precommitment can be modeled as a decision problem where an AI is asked to write a successor AI.
3) Imagine the AI is asked to write a program P that will be faced with Counterfactual Mugging with a logical coin (e.g. parity of the millionth digit of pi). The resulting utility goes to the AI. The AI writing P doesn’t have enough resources to compute the coin’s outcome, but P is allowed to use as much resources as needed.
4) Writing P is equivalent to supplying only one bit: should P pay up if asked?
5) Supplying that bit is equivalent to accepting or declining the bet “win $10000 if the millionth digit of pi is even, lose $100 if it’s odd”.
6) So if your AI can make bets about the digits of pi (which means it represents logical uncertainty as probabilities), it should also pay up in Counterfactual Mugging with a logical coin, even if it already has enough resources to compute the coin’s outcome. The AI’s initlal state of logical uncertainty should be “frozen” into its utility function, just like all other kinds of uncertainty (the U in UDT means “updateless”).
Maybe this argument only shows that representing logical uncertainty as probabilities is weird. Everyone is welcome to try and figure out a better way :-)
It’s dangerous to phrase it this way, since coordination (which is what really happens) allows using more knowledge than was available at the time of a possible precommitment, as I described here.
Not if the correct decision depends on an abstract fact that you can’t access, but can reference. In that case, P should implement a strategy of acting depending on the value of that fact (computing and observing that value to feed to the strategy). That is, abstract facts that will only be accessible in the future play the same role as observations that will only be accessible in the future, and a strategy can be written conditionally on either.
The difference between abstract facts and observations however is that observations may tell you where you are, without telling you what exists and what doesn’t (both counterfactuals exist and have equal value, you’re in one of them), while abstract facts can tell you what exists and what doesn’t (the other logical counterfactual doesn’t exist and has zero value).
In general, the distinction is important. But, for this puzzle, the proposition “asked” is equivalent to the relevant “abstract fact”. The agent is asked iff the millionth digit of pi is odd. So point (4) already provides as much of a conditional strategy as is possible.
It’s assumed that the agent doesn’t know if the digit is odd (and whether it’ll be in the situation described in the post) at this point. The proposal to self-modify is a separate event that precedes the thought experiment.
Yes. Similarly, it doesn’t know whether it will be asked (rather than do the asking) at this point.
I see, so there’s indeed just one bit, and it should be “don’t cooperate”.
This is interesting in that UDT likes to ignore epistemic significance of observations, but here we have an observation that implies something about the world, and not just tells where the agent is. How does one reason about strategies if different branches of those strategies tell something about the value of the other branches?..
Good point, thanks. I think it kills my argument.
ETA: no, it doesn’t.
As Tyrrell points out, it’s not as simple. When you’re considering the strategy of what to do if you’re on the giving side of the counterfactual (“Should P pay up if asked?”), the fact that you’re in that situation already implies all you wanted to know about the digit of pi, so the strategy is not to play conditionally on the digit of pi, but just to either pay up or not, one bit as you said. But the value of the decision on that branch of the strategy follows from the logical implications of being on that branch, which is something new for UDT!