2) “Agent simulates predictor”, or ASP: if you have way more computing power than Omega, then Omega can predict you can obtain its decision just by simulation, so you will two-box; but obviously this isn’t what you want to do.
3) “The stupid winner paradox”: if two superintelligences play a demand game for $10, presumably they can agree to take $5 each to avoid losing it all. But a human playing against a superintelligence can just demand $9, knowing the superintelligence will predict his decision and be left with only $1.
4) “A/B/~CON”: action A gets you $5, action B gets you $10. Additionally you will receive $1 if inconsistency of PA is ever proved. This way you can’t write a terminating utility() function, but can still define the value of utility axiomatically. This is supposed to exemplify all the tractable cases where one action is clearly superior to the other, but total utility is uncomputable.
5) The general case of agents playing a non-zero-sum game against each other, knowing each other’s source code. For example, the Prisoner’s Dilemma with asymmetrized payoffs.
I could make a separate post from this list, but I’ve been making way too many toplevel posts lately.
How is this not resolved? (My comment and the following Eliezer’s comment; I didn’t re-read the rest of the discussion.)
2) “Agent simulates predictor”
This basically says that the predictor is a rock, doesn’t depend on agent’s decision, which makes the agent lose because of the way problem statement argues into stipulating (outside of predictor’s own decision process) that this must be a two-boxing rock rather than a one-boxing rock.
3) “The stupid winner paradox”
Same as (2). We stipulate the weak player to be a $9 rock. Nothing to be surprised about.
4) “A/B/~CON”
Requires ability to reason under logical uncertainty, comparing theories of consequences and not just specific possible utilities following from specific possible actions. Under any reasonable axioms for valuation of sets of consequences, action B wins.
5) The general case of agents playing a non-zero-sum game against each other, knowing each other’s source code.
Without good understanding of reasoning under logical uncertainty, this one remains out.
Answering your questions 13 years later because I want to cite cousin_it’s list of open problems, and others may see your comment and wonder what the answers are. I’m not sure about 4 on his list but I think 1, 2, 3, and 5 are definitely still open problems.
How is this [1. 2TDT-1CDT] not resolved?
Consider the evolutionary version of this. Suppose there’s a group of TDT (or UDT or FDT) agents and one of them got a random mutation that changed it into a CDT agent, and this was known to everyone (but not the identity of the CDT agent). If two randomly selected agents paired off to play true PD against each other, a TDT agent would play C (since they’re still likely facing another TDT agent) and the CDT agent would play D. So the CDT agent would be better off, and it would want to be really careful not to become a TDT agent or delegate to a TDT AI or become accidentally correlated with TDT agents. This doesn’t necessarily mean that TDT/UDT/FDT is wrong, but seems like a weird outcome, plus how do we know that we’re not in a situation like the one that the CDT agent is in (i.e., should be very careful not to become/delegate to a TDT-like agent)?
Eliezer also ended up thinking this might be a real issue.
This [2. Agent simulates predictor] basically says that the predictor is a rock, doesn’t depend on agent’s decision, which makes the agent lose because of the way problem statement argues into stipulating (outside of predictor’s own decision process) that this must be a two-boxing rock rather than a one-boxing rock.
No, we’re not saying the predictor is a rock. We’re assuming that the predictor is using some kind reasoning process to make the prediction. Specifically the predictor could reasoning as follows: The agent is using UDT1.1 (for example). UDT1.1 is not updateless with regard to logical facts. Given enough computing power (which the agent has) it will inevitably simulate me and then update on my prediction, after which it will view two-boxing as having higher expected utility (no matter what my prediction actually is). Therefore I should predict that it will two-box.
[3. The stupid winner paradox] Same as (2). We stipulate the weak player to be a $9 rock. Nothing to be surprised about.
No, again the weak player is applying reasoning to decide to demand $9, similar to the reasoning of the predictor above. To spell it out: My opponent is not logically updateless. Whatever I decide, it will simulate me and update on my decision, after which it will play the best response against that. Therefore I should demand $9.
(“Be logically updateless” is the seemingly obvious implication here, but how to do that without running into other issues is also an open problem.)
I currently prefer the version of ASP where the agent has the option of simulating the predictor, but doesn’t have to do so or to make any particular use of the results if it does. Dependencies are contingent, behavior of either party can result in some property of one of them (or of them together jointly) not getting observed (through reasoning) by the other, including in a way predictable to the process of choosing said behavior.
This basically says that the predictor is a rock, doesn’t depend on agent’s decision,
True, it doesn’t “depend” on the agent’s decision in the specific sense of “dependency” defined by currently-formulated UDT. The question (as with any proposed DT) is whether that’s in fact the right sense of “dependency” (between action and utility) to use for making decisions. Maybe it is, but the fact that UDT itself says so is insufficient reason to agree.
Maybe it is, but the fact that UDT itself says so is insufficient reason to agree.
The arguments behind UDT’s choice of dependence could prove strong enough to resolve this case as well. The fact that we are arguing about UDT’s answer in no way disqualifies UDT’s arguments.
My current position on ASP is that reasoning used in motivating it exhibits “explicit dependence bias”. I’ll need to (and probably will) write another top-level post on this topic to improve on what I’ve already written here and on the decision theory list.
About 2TDT-1CDT Wei didn’t seem to consider it 100% solved, as of this August or September if I recall right. You’ll have to ask him.
About ASP I agree with Gary: we do not yet completely understand the implications of the fact that a human like me can win in this situation, while UDT can’t.
About A/B/~CON I’d like to see some sort of mechanical reasoning procedure that leads to the answer. You do remember that Wei’s “existential” patch has been shown to not work, and my previous algorithm without that patch can’t handle this particular problem, right?
(For onlookers: this exchange refers to a whole lot of previous discussion on the decision-theory-workshop mailing list. Read at your own risk.)
About ASP I agree with Gary: we do not yet completely understand the implications of the fact that a human like me can win in this situation, while UDT can’t.
Both outcomes are stipulated in the corresponding unrelated decision problems. This is an example of explicit dependency bias, where you consider a collection of problem statements indexed by agents’ algorithms, or agents’ decisions in an arbitrary way. Nothing follows from there being a collection with so and so consequences of picking a certain element of it. Relation between the agents and problem statements connected in such a collection is epiphenomenal to agents’ adequacy. I should probably write up a post to that effect. Only ambient consequences count, where you are already the agent that is part of (state of knowledge about) an environment and need to figure out what to do, for example which AI to construct and submit your decision to. Otherwise you are changing the problem, not reasoning about what to do in a given problem.
About A/B/~CON I’d like to see some sort of mechanical reasoning procedure that leads to the answer. You do remember that Wei’s “existential” fix has been shown to not work, and my previous algorithm without that fix can’t handle this particular problem, right?
You can infer that A=>U \in {5,6} and B=>U \in {10,11}. Then, instead of only recognizing moral arguments of the form A=>U=U1, you need to be able to recognize such more general arguments. It’s clear which of the two to pick.
You can infer that A=>U \in {5,6} and B=>U \in {10,11}. Then, instead of only recognizing moral arguments of the form A=>U=U1, you need to be able to recognize such more general arguments. It’s clear which of the two to pick.
Is that the only basis on which UDT or a UDT-like algorithm would decide on such a problem? What about a variant where action A gives you $5, plus $6 iff it is ever proved that P≠NP, and action B gives you $10, plus $5 iff P=NP is ever proved? Here too you could say that A=>U \in {5,11} and B=>U \in {10,15}, but A is probably preferable.
“Agent simulates predictor”, or ASP: if you have way more computing power than Omega, then Omega can predict you can obtain its decision just by simulation, so you will two-box; but obviously this isn’t what you want to do.
If you can predict Omega, but Omega can still predict you well enough for the problem to be otherwise the same, then, given that you anticipate that if you predict Omega’s decision then you will two-box and lose, can’t you choose not to predict Omega (instead deciding the usual way, resulting in one-boxing), knowing that Omega will correctly predict that you will not obtain its decision by simulation?
(Sorry, I know that’s a cumbersome sentence; hope its meaning was clear.)
“The stupid winner paradox”: if two superintelligences play a demand game for $10, presumably they can agree to take $5 each to avoid losing it all. But a human playing against a superintelligence can just demand $9, knowing the superintelligence will predict his decision and be left with only $1.
By “demand game” are you referring to the ultimatum game?
“A/B/~CON”: action A gets you $5, action B gets you $10. Additionally you will receive $1 if inconsistency of PA is ever proved. This way you can’t write a terminating utility() function, but can still define the value of utility axiomatically. This is supposed to exemplify all the tractable cases where one action is clearly superior to the other, but total utility is uncomputable.
Is the $1 independent of whether you pick action A or action B?
1) The challenge is not solving this individual problem, but creating a general theory that happens to solve this special case automatically. Our current formalizations of UDT fail on ASP—they have no concept of “stop thinking”.
2) No, I mean the game where two players write each a sum of money on a piece of paper, if the total is over $10 then both get nothing, otherwise each player gets the sum they wrote.
1) The challenge is not solving this individual problem, but creating a general theory that happens to solve this special case automatically. Our current formalizations of UDT fail on ASP—they have no concept of “stop thinking”.
Okay.
So, the superintelligent UDT agent can essentially see through both boxes (whether it wants to or not… or, rather, has no concept of not wanting to). Sorry if this is a stupid question, but wouldn’t UDT one-box anyway, whether the box is empty or contains $1,000,000, for the same reason that it pays in Counterfactual Mugging and Parfit’s Hitchhiker? When the box is empty, it takes the empty box so that there will be possible worlds where the box is not empty (as it would pay the counterfactual mugger so that it will get $10,000 in the other half of worlds), and when the box is not empty, it takes only the one box (despite seeing the extra money in the other box) so that the world it’s in will weigh 50% rather than 0% (as it would pay the driver in Parfit’s Hitchhiker, despite it having “already happened”, so that the worlds in which the hitchhiker gives it a ride in the first place will weigh 100% rather than 0%).
In our current implementations of UDT, the agent won’t find any proof that one-boxing leads to the predictor predicting one-boxing, because the agent doesn’t “know” that it’s only going to use a small fraction of its computing resources while searching for the proof. Maybe a different implementation could fix that.
It’s not an implementation of UDT in the sense that it doesn’t talk about all possible programs and universal prior on them. If you consider UDT as generalizing to ADT, where probability assumptions are dropped, then sure.
Um, I don’t consider the universal prior to be part of UDT proper. UDT can run on top of any prior, e.g. when you use it to solve toy problems as Wei did, you use small specialized priors.
Oh, lots of open problems remain. Here’s a handy list of what I have in mind right now:
1) 2TDT-1CDT.
2) “Agent simulates predictor”, or ASP: if you have way more computing power than Omega, then Omega can predict you can obtain its decision just by simulation, so you will two-box; but obviously this isn’t what you want to do.
3) “The stupid winner paradox”: if two superintelligences play a demand game for $10, presumably they can agree to take $5 each to avoid losing it all. But a human playing against a superintelligence can just demand $9, knowing the superintelligence will predict his decision and be left with only $1.
4) “A/B/~CON”: action A gets you $5, action B gets you $10. Additionally you will receive $1 if inconsistency of PA is ever proved. This way you can’t write a terminating utility() function, but can still define the value of utility axiomatically. This is supposed to exemplify all the tractable cases where one action is clearly superior to the other, but total utility is uncomputable.
5) The general case of agents playing a non-zero-sum game against each other, knowing each other’s source code. For example, the Prisoner’s Dilemma with asymmetrized payoffs.
I could make a separate post from this list, but I’ve been making way too many toplevel posts lately.
How is this not resolved? (My comment and the following Eliezer’s comment; I didn’t re-read the rest of the discussion.)
This basically says that the predictor is a rock, doesn’t depend on agent’s decision, which makes the agent lose because of the way problem statement argues into stipulating (outside of predictor’s own decision process) that this must be a two-boxing rock rather than a one-boxing rock.
Same as (2). We stipulate the weak player to be a $9 rock. Nothing to be surprised about.
Requires ability to reason under logical uncertainty, comparing theories of consequences and not just specific possible utilities following from specific possible actions. Under any reasonable axioms for valuation of sets of consequences, action B wins.
Without good understanding of reasoning under logical uncertainty, this one remains out.
Answering your questions 13 years later because I want to cite cousin_it’s list of open problems, and others may see your comment and wonder what the answers are. I’m not sure about 4 on his list but I think 1, 2, 3, and 5 are definitely still open problems.
Consider the evolutionary version of this. Suppose there’s a group of TDT (or UDT or FDT) agents and one of them got a random mutation that changed it into a CDT agent, and this was known to everyone (but not the identity of the CDT agent). If two randomly selected agents paired off to play true PD against each other, a TDT agent would play C (since they’re still likely facing another TDT agent) and the CDT agent would play D. So the CDT agent would be better off, and it would want to be really careful not to become a TDT agent or delegate to a TDT AI or become accidentally correlated with TDT agents. This doesn’t necessarily mean that TDT/UDT/FDT is wrong, but seems like a weird outcome, plus how do we know that we’re not in a situation like the one that the CDT agent is in (i.e., should be very careful not to become/delegate to a TDT-like agent)?
Eliezer also ended up thinking this might be a real issue.
No, we’re not saying the predictor is a rock. We’re assuming that the predictor is using some kind reasoning process to make the prediction. Specifically the predictor could reasoning as follows: The agent is using UDT1.1 (for example). UDT1.1 is not updateless with regard to logical facts. Given enough computing power (which the agent has) it will inevitably simulate me and then update on my prediction, after which it will view two-boxing as having higher expected utility (no matter what my prediction actually is). Therefore I should predict that it will two-box.
No, again the weak player is applying reasoning to decide to demand $9, similar to the reasoning of the predictor above. To spell it out: My opponent is not logically updateless. Whatever I decide, it will simulate me and update on my decision, after which it will play the best response against that. Therefore I should demand $9.
(“Be logically updateless” is the seemingly obvious implication here, but how to do that without running into other issues is also an open problem.)
I currently prefer the version of ASP where the agent has the option of simulating the predictor, but doesn’t have to do so or to make any particular use of the results if it does. Dependencies are contingent, behavior of either party can result in some property of one of them (or of them together jointly) not getting observed (through reasoning) by the other, including in a way predictable to the process of choosing said behavior.
True, it doesn’t “depend” on the agent’s decision in the specific sense of “dependency” defined by currently-formulated UDT. The question (as with any proposed DT) is whether that’s in fact the right sense of “dependency” (between action and utility) to use for making decisions. Maybe it is, but the fact that UDT itself says so is insufficient reason to agree.
[EDIT: fixed typo]
The arguments behind UDT’s choice of dependence could prove strong enough to resolve this case as well. The fact that we are arguing about UDT’s answer in no way disqualifies UDT’s arguments.
My current position on ASP is that reasoning used in motivating it exhibits “explicit dependence bias”. I’ll need to (and probably will) write another top-level post on this topic to improve on what I’ve already written here and on the decision theory list.
About 2TDT-1CDT Wei didn’t seem to consider it 100% solved, as of this August or September if I recall right. You’ll have to ask him.
About ASP I agree with Gary: we do not yet completely understand the implications of the fact that a human like me can win in this situation, while UDT can’t.
About A/B/~CON I’d like to see some sort of mechanical reasoning procedure that leads to the answer. You do remember that Wei’s “existential” patch has been shown to not work, and my previous algorithm without that patch can’t handle this particular problem, right?
(For onlookers: this exchange refers to a whole lot of previous discussion on the decision-theory-workshop mailing list. Read at your own risk.)
Both outcomes are stipulated in the corresponding unrelated decision problems. This is an example of explicit dependency bias, where you consider a collection of problem statements indexed by agents’ algorithms, or agents’ decisions in an arbitrary way. Nothing follows from there being a collection with so and so consequences of picking a certain element of it. Relation between the agents and problem statements connected in such a collection is epiphenomenal to agents’ adequacy. I should probably write up a post to that effect. Only ambient consequences count, where you are already the agent that is part of (state of knowledge about) an environment and need to figure out what to do, for example which AI to construct and submit your decision to. Otherwise you are changing the problem, not reasoning about what to do in a given problem.
You can infer that A=>U \in {5,6} and B=>U \in {10,11}. Then, instead of only recognizing moral arguments of the form A=>U=U1, you need to be able to recognize such more general arguments. It’s clear which of the two to pick.
Is that the only basis on which UDT or a UDT-like algorithm would decide on such a problem? What about a variant where action A gives you $5, plus $6 iff it is ever proved that P≠NP, and action B gives you $10, plus $5 iff P=NP is ever proved? Here too you could say that A=>U \in {5,11} and B=>U \in {10,15}, but A is probably preferable.
If you can predict Omega, but Omega can still predict you well enough for the problem to be otherwise the same, then, given that you anticipate that if you predict Omega’s decision then you will two-box and lose, can’t you choose not to predict Omega (instead deciding the usual way, resulting in one-boxing), knowing that Omega will correctly predict that you will not obtain its decision by simulation?
(Sorry, I know that’s a cumbersome sentence; hope its meaning was clear.)
By “demand game” are you referring to the ultimatum game?
Is the $1 independent of whether you pick action A or action B?
1) The challenge is not solving this individual problem, but creating a general theory that happens to solve this special case automatically. Our current formalizations of UDT fail on ASP—they have no concept of “stop thinking”.
2) No, I mean the game where two players write each a sum of money on a piece of paper, if the total is over $10 then both get nothing, otherwise each player gets the sum they wrote.
3) Yeah, the $1 is independent.
Okay.
So, the superintelligent UDT agent can essentially see through both boxes (whether it wants to or not… or, rather, has no concept of not wanting to). Sorry if this is a stupid question, but wouldn’t UDT one-box anyway, whether the box is empty or contains $1,000,000, for the same reason that it pays in Counterfactual Mugging and Parfit’s Hitchhiker? When the box is empty, it takes the empty box so that there will be possible worlds where the box is not empty (as it would pay the counterfactual mugger so that it will get $10,000 in the other half of worlds), and when the box is not empty, it takes only the one box (despite seeing the extra money in the other box) so that the world it’s in will weigh 50% rather than 0% (as it would pay the driver in Parfit’s Hitchhiker, despite it having “already happened”, so that the worlds in which the hitchhiker gives it a ride in the first place will weigh 100% rather than 0%).
In our current implementations of UDT, the agent won’t find any proof that one-boxing leads to the predictor predicting one-boxing, because the agent doesn’t “know” that it’s only going to use a small fraction of its computing resources while searching for the proof. Maybe a different implementation could fix that.
It’s not an implementation of UDT in the sense that it doesn’t talk about all possible programs and universal prior on them. If you consider UDT as generalizing to ADT, where probability assumptions are dropped, then sure.
Um, I don’t consider the universal prior to be part of UDT proper. UDT can run on top of any prior, e.g. when you use it to solve toy problems as Wei did, you use small specialized priors.
There are no priors used in those toy problems, just one utility definition of interest.
Well, the use of any priors over possible worlds is the thing I find objectionable.
Cool, thanks.