Okay, because I’m bored and have nothing to do, and I’m not going to be doing serious work today, I’ll explain my reasoning more fully on this problem. As stated:
You face two open boxes, Left and Right, and you must take one of them. In the Left box, there is a live bomb; taking this box will set off the bomb, setting you ablaze, and you certainly will burn slowly to death. The Right box is empty, but you have to pay $100 in order to be able to take it.
A long-dead predictor predicted whether you would choose Left or Right, by running a simulation of you and seeing what that simulation did. If the predictor predicted that you would choose Right, then she put a bomb in Left. If the predictor predicted that you would choose Left, then she did not put a bomb in Left, and the box is empty.
The predictor has a failure rate of only 1 in a trillion trillion. Helpfully, she left a note, explaining that she predicted that you would take Right, and therefore she put the bomb in Left.
You are the only person left in the universe. You have a happy life, but you know that you will never meet another agent again, nor face another situation where any of your actions will have been predicted by another agent. What box should you choose?
Without reference to any particular decision theory, let’s look at which is the actually correct option, and then we can see which decision theory would output that action in order to evaluate which one might “obtain more utility.”
The situation you describe with glass windows is a completely different problem and has a possibly different conclusion, so I’m not going to analyse that one.
Given in the problem statement we have:
We have the experiential knowledge that we are faced with two open boxes. We have the logically certain knowledge that: 1. we must take one of them; 2. in the Left box is a live bomb; 3. taking Left will set off the bomb, which will then set you on fire and burn you slowly to death; 4. That Right is empty, but you have to pay $100 in order to be able to take it.
This is an impossible situation, in that no actual agent could actually be put in this situation. However, so far, the implications are as follows:
Our experiential knowledge may be incorrect. If it is, then the logically certain knowledge can be ignored because it is as if we have a false statement as the precondition for the material conditional.
If it isn’t, then the logical implication goes:
We must take at least one box. The Left box, which is understood to mean the box experientially to the left of us when we “magically appeared” at the start of this problem rather than any box which we may put on our left for example by walking around to the other side of whatever thing may be containing the boxes (henceforth referred to as just Left and symmetrically Right), contains a live bomb, which is understood to mean a bomb which when some triggering process happens will by default explode, provided that process works correctly.
We are given the logically certain future knowledge that this process will work correctly, and the bomb will explode and burn us slowly to death, if and only if we take the Left box.
We also have the logically certain current knowledge that the Right box is empty, but the “triggering process” that allows us to take it as an option has a prerequisite of “giving up $100,” whatever that means.
Okay so far.
A long-dead predictor predicted whether you would choose Left or Right, by running a simulation of you and seeing what that simulation did. If the predictor predicted that you would choose Right, then she put a bomb in Left. If the predictor predicted that you would choose Left, then she did not put a bomb in Left, and the box is empty.
The predictor has a failure rate of only 1 in a trillion trillion. Helpfully, she left a note, explaining that she predicted that you would take Right, and therefore she put the bomb in Left.
We will not assume that this “predictor” was any particular type of thing; in particular, it need not be a person.
In order to make the problem as written less confusing, we will assume that “Left” and “Right” for the predictor refer to the same things I’ve explained above.
Since there is no possible causal source for this information, as we have been magically instantiated into an impossible situation, the above quoted knowledge must also be logically certain.
Now, in thinking through this problem, we may pause, and reason that this predictor seems helpful; in that it deliberately put the bomb in the box which it, to its best judgement, predicted we would not take. Further, given in the problem statement, there is the sentence “Helpfully, she left a note, explaining that she predicted that you would take Right, and therefore she put the bomb in Left.”
This is strong evidence that the predictor, which recall we are not assuming is any type of particular thing (but this very fact doesn’t exclude the possibility that it is an agent or person-like being), is behaving as if it were helpful. So by default, we would like to trust the predictor. So when our logically certain knowledge says that she has a failure rate of 1 in 1,000,000,000,000,000,000,000,000, we would prefer to trust that as being correct.
Now, previously we found the possibility that our experiential knowledge may be incorrect; that is, we may not in fact be faced with two open boxes, even though it looks like we are; or the boxes may not contain a bomb/be empty, or some other thing. This depends on the “magically placed” being’s confidence in the ability of itself to make inferences based on the sensory information it receives.
What we observe, from the problem statement, is that there does appear to be a bomb in the Left box, and that the Right box does appear to be empty. However, we also observe that we would prefer this observation to be wrong such that our logical knowledge that we must take a box is incorrect. Because if we can avoid taking either box, then there is no risk of death, nor of losing $100.
By default, one would wish to continue a “happy life,” however in the problem statement we are given that we will never see another agent again. The prediction that a rational agent can make from this is that their life will eventually become unhappy, because happiness is known to be a temporary condition; and other agents can be physically made from natural resources given enough time; and therefore there are limited physical resources and/or time such that there is not enough to make another agent.
Making another agent when you are the only agent in existence is probably one of the hardest possible problems, but nevertheless, if you cannot do it, then you can predict that you will eventually run out of physical resources and time no matter what happens, and therefore you are in a finite universe regardless of anything else.
Since you have definitively located yourself in a finite universe; and you also have the logically certain knowledge that the simulator/predictor is long-dead and appears to be helpful, this is logically consistent as a possible world-state.
Now we have to reason about whether the experiential evidence we are seeing has a chance of more than 1,000,000,000,000,000,000,000,000 of being correct. We know how to do this: just use probability theory, which can be reduced to a mechanical procedure.
However, since we have limited resources, before we actually do this computation, we should reason about what our decision would be in each case, since there are only three possibilities:
1. the experiential evidence is less likely, in which case the simulator probably hasn’t made an error, but the first part of the logically certain knowledge we have can be ignored.
2. or the experiential evidence is more likely, in which case it’s possible that the simulator made a mistake, and although it appears trustworthy, we would be able to say that its prediction may be wrong and (separately), perhaps its note was not helpful.
3. They are exactly equally likely, in which case we default to trusting the simulator.
In each case, what would be our action?
1. In this case, the logically certain knowledge we have that we must choose one of the boxes can be ignored, but it may still be correct. So we have to find some way to check independently whether it might be true without making use of the logically certain knowledge. One way is to take the same action as option two; in addition you can split the propositions in the problem statement into atoms and take the power set and consider the implications of each one. The total information obtained by this process will inform your decision. However, logical reasoning is just another form of obtaining evidence for non-logically-omniscient agents and so in practice this option reduces to exactly the same set of possible actions as option 2. following:
2. In this case, all we have to go on is our current experiential knowledge, because the source of all our logically certain knowledge is the simulator, and since in this branch the experiential knowledge is more likely, the simulator is more likely to have made a mistake, and we must work out for ourselves what the actual situation is.
Depending on the results of that process, you might
1. Just take right, if you have $100 on you and you observe you are under coercion of some form (including “too much” time pressure; ie, if you do not have enough time to come to a decision)
2. Take neither box, because both are net negative
3. Figure out what is going on and then come back and potentially disarm the bomb/box setup in some way. Potentially in this scenario (or 2) you may be in a universe which is not finite and so even if you observe you are completely alone, it may be possible to create other agents or to do whatever else interests you and therefore have whichever life you choose for an indefinitely long time.
4. Take left and it does in fact result in the bomb exploding and you painfully dying, if the results of your observations and reasoning process output that this is the best option for some reason.
5. Take left and nothing happens because the bomb triggering process failed, and you save yourself $100.
For the purposes of this argument, we don’t need to (and realistically, can’t) know precisely which situations would cause outcome 4 to occur, because it seems extremely unlikely that any rational agent would deliberately produce this outcome except if it had a preference for dying painfully.
Trying to imagine the possible worlds in which this could occur is a fruitless endeavour because the entire setup is already impossible. However you will notice that we have already decided in advance on a small number of potential actions that we might take if we did find ourselves in this impossible scenario.
That in itself substantially reduces the resources required to make a decision if the situation were to somehow happen even though it’s impossible—we have reduced the problem to a choice of 5 actions rather than infinite, and also helped our (counterfactual self, in this impossible world) make their choice easier.
3. Case three (exactly equal likelihood) is the same action as case 1 (and hence also case 2), because the bomb setup gives us only negative utility options and the simulator setup has both positive and negative utility options, so we trust the simulator.
Now, the only situation in which right is ever taken is if the simulator is wrong and you are under coercion.
Since in the problem statement it says:
You are the only person left in the universe. You have a happy life, but you know that you will never meet another agent again, nor face another situation where any of your actions will have been predicted by another agent. What box should you choose?
then by the problem definition, this cannot be the case except if you do not have adequate time to make a decision. So if you can come to a decision before the universe you are in ends, then Right will never be chosen, because the only possible type of coercion (since there are no other agents) is inadequate time/resources. If you can’t, then you might take right.
However, you can just use FDT to make your decision near-instantly, since this has already been studied, and it outputs Left. Since this is the conclusion you have come to by your chain of reasoning, you can pick left.
But it may still be the case, independently of both of these things, (since we are in an impossible world), that the bomb will go off.
So for an actual agent, the actual action you would take can only be described as “make the best decision you can at the time, using everything you know.”
Since we have reasoned about the possible set of actions ahead of time, we can choose from the (vaguely specified) set of 5 actions above, or we can do something else; given that we know about this reasoning that we have already performed and if actually placed in the situation we would have more evidence which could inform our actions.
However, the set of 5 actions covers all the possibilities. We also know that we would only take right if we can’t come to a decision in time, or if we are under coercion. In all other cases we prefer to take Left or take neither, or do something else entirely.
Since there are exactly two possible worlds under which we take Right, and an indefinitely large number in which we take Left, the maximum-utility option is outputted correctly by FDT which is to take Left.
In the bomb question, what we need is causal theory in which the ASI agent accurately gauges that a universe of one indicates loneliness and not in fact happiness, which is predicated on friendliness (at least for an ASI) (and I would be slightly concerned as an external observer as to why the universe was reduced to a single agent if it were not due to entropy), then figures out the perfect predictor was a prior ASI, not from that universe, giving it a clue, and then, adding all its available power to the bomb, following Asimov says: “LET THERE BE LIGHT!” And with an almighty bang (and perhaps even with all that extra explosive power, no small pain) there was light--
Okay, because I’m bored and have nothing to do, and I’m not going to be doing serious work today, I’ll explain my reasoning more fully on this problem. As stated:
Without reference to any particular decision theory, let’s look at which is the actually correct option, and then we can see which decision theory would output that action in order to evaluate which one might “obtain more utility.”
The situation you describe with glass windows is a completely different problem and has a possibly different conclusion, so I’m not going to analyse that one.
Given in the problem statement we have:
This is an impossible situation, in that no actual agent could actually be put in this situation. However, so far, the implications are as follows:
Our experiential knowledge may be incorrect. If it is, then the logically certain knowledge can be ignored because it is as if we have a false statement as the precondition for the material conditional.
If it isn’t, then the logical implication goes:
Okay so far.
We will not assume that this “predictor” was any particular type of thing; in particular, it need not be a person.
In order to make the problem as written less confusing, we will assume that “Left” and “Right” for the predictor refer to the same things I’ve explained above.
Since there is no possible causal source for this information, as we have been magically instantiated into an impossible situation, the above quoted knowledge must also be logically certain.
Now, in thinking through this problem, we may pause, and reason that this predictor seems helpful; in that it deliberately put the bomb in the box which it, to its best judgement, predicted we would not take. Further, given in the problem statement, there is the sentence “Helpfully, she left a note, explaining that she predicted that you would take Right, and therefore she put the bomb in Left.”
This is strong evidence that the predictor, which recall we are not assuming is any type of particular thing (but this very fact doesn’t exclude the possibility that it is an agent or person-like being), is behaving as if it were helpful. So by default, we would like to trust the predictor. So when our logically certain knowledge says that she has a failure rate of 1 in 1,000,000,000,000,000,000,000,000, we would prefer to trust that as being correct.
Now, previously we found the possibility that our experiential knowledge may be incorrect; that is, we may not in fact be faced with two open boxes, even though it looks like we are; or the boxes may not contain a bomb/be empty, or some other thing. This depends on the “magically placed” being’s confidence in the ability of itself to make inferences based on the sensory information it receives.
What we observe, from the problem statement, is that there does appear to be a bomb in the Left box, and that the Right box does appear to be empty. However, we also observe that we would prefer this observation to be wrong such that our logical knowledge that we must take a box is incorrect. Because if we can avoid taking either box, then there is no risk of death, nor of losing $100.
By default, one would wish to continue a “happy life,” however in the problem statement we are given that we will never see another agent again. The prediction that a rational agent can make from this is that their life will eventually become unhappy, because happiness is known to be a temporary condition; and other agents can be physically made from natural resources given enough time; and therefore there are limited physical resources and/or time such that there is not enough to make another agent.
Making another agent when you are the only agent in existence is probably one of the hardest possible problems, but nevertheless, if you cannot do it, then you can predict that you will eventually run out of physical resources and time no matter what happens, and therefore you are in a finite universe regardless of anything else.
Since you have definitively located yourself in a finite universe; and you also have the logically certain knowledge that the simulator/predictor is long-dead and appears to be helpful, this is logically consistent as a possible world-state.
Now we have to reason about whether the experiential evidence we are seeing has a chance of more than 1,000,000,000,000,000,000,000,000 of being correct. We know how to do this: just use probability theory, which can be reduced to a mechanical procedure.
However, since we have limited resources, before we actually do this computation, we should reason about what our decision would be in each case, since there are only three possibilities:
1. the experiential evidence is less likely, in which case the simulator probably hasn’t made an error, but the first part of the logically certain knowledge we have can be ignored.
2. or the experiential evidence is more likely, in which case it’s possible that the simulator made a mistake, and although it appears trustworthy, we would be able to say that its prediction may be wrong and (separately), perhaps its note was not helpful.
3. They are exactly equally likely, in which case we default to trusting the simulator.
In each case, what would be our action?
1. In this case, the logically certain knowledge we have that we must choose one of the boxes can be ignored, but it may still be correct. So we have to find some way to check independently whether it might be true without making use of the logically certain knowledge. One way is to take the same action as option two; in addition you can split the propositions in the problem statement into atoms and take the power set and consider the implications of each one. The total information obtained by this process will inform your decision. However, logical reasoning is just another form of obtaining evidence for non-logically-omniscient agents and so in practice this option reduces to exactly the same set of possible actions as option 2. following:
2. In this case, all we have to go on is our current experiential knowledge, because the source of all our logically certain knowledge is the simulator, and since in this branch the experiential knowledge is more likely, the simulator is more likely to have made a mistake, and we must work out for ourselves what the actual situation is.
Depending on the results of that process, you might
1. Just take right, if you have $100 on you and you observe you are under coercion of some form (including “too much” time pressure; ie, if you do not have enough time to come to a decision)
2. Take neither box, because both are net negative
3. Figure out what is going on and then come back and potentially disarm the bomb/box setup in some way. Potentially in this scenario (or 2) you may be in a universe which is not finite and so even if you observe you are completely alone, it may be possible to create other agents or to do whatever else interests you and therefore have whichever life you choose for an indefinitely long time.
4. Take left and it does in fact result in the bomb exploding and you painfully dying, if the results of your observations and reasoning process output that this is the best option for some reason.
5. Take left and nothing happens because the bomb triggering process failed, and you save yourself $100.
For the purposes of this argument, we don’t need to (and realistically, can’t) know precisely which situations would cause outcome 4 to occur, because it seems extremely unlikely that any rational agent would deliberately produce this outcome except if it had a preference for dying painfully.
Trying to imagine the possible worlds in which this could occur is a fruitless endeavour because the entire setup is already impossible. However you will notice that we have already decided in advance on a small number of potential actions that we might take if we did find ourselves in this impossible scenario.
That in itself substantially reduces the resources required to make a decision if the situation were to somehow happen even though it’s impossible—we have reduced the problem to a choice of 5 actions rather than infinite, and also helped our (counterfactual self, in this impossible world) make their choice easier.
3. Case three (exactly equal likelihood) is the same action as case 1 (and hence also case 2), because the bomb setup gives us only negative utility options and the simulator setup has both positive and negative utility options, so we trust the simulator.
Now, the only situation in which right is ever taken is if the simulator is wrong and you are under coercion.
Since in the problem statement it says:
then by the problem definition, this cannot be the case except if you do not have adequate time to make a decision. So if you can come to a decision before the universe you are in ends, then Right will never be chosen, because the only possible type of coercion (since there are no other agents) is inadequate time/resources. If you can’t, then you might take right.
However, you can just use FDT to make your decision near-instantly, since this has already been studied, and it outputs Left. Since this is the conclusion you have come to by your chain of reasoning, you can pick left.
But it may still be the case, independently of both of these things, (since we are in an impossible world), that the bomb will go off.
So for an actual agent, the actual action you would take can only be described as “make the best decision you can at the time, using everything you know.”
Since we have reasoned about the possible set of actions ahead of time, we can choose from the (vaguely specified) set of 5 actions above, or we can do something else; given that we know about this reasoning that we have already performed and if actually placed in the situation we would have more evidence which could inform our actions.
However, the set of 5 actions covers all the possibilities. We also know that we would only take right if we can’t come to a decision in time, or if we are under coercion. In all other cases we prefer to take Left or take neither, or do something else entirely.
Since there are exactly two possible worlds under which we take Right, and an indefinitely large number in which we take Left, the maximum-utility option is outputted correctly by FDT which is to take Left.
In the bomb question, what we need is causal theory in which the ASI agent accurately gauges that a universe of one indicates loneliness and not in fact happiness, which is predicated on friendliness (at least for an ASI) (and I would be slightly concerned as an external observer as to why the universe was reduced to a single agent if it were not due to entropy), then figures out the perfect predictor was a prior ASI, not from that universe, giving it a clue, and then, adding all its available power to the bomb, following Asimov says: “LET THERE BE LIGHT!” And with an almighty bang (and perhaps even with all that extra explosive power, no small pain) there was light--