This sort of reasoning makes sense if you must decide on which box to take prior to learning the details of your situation (a.k.a. in a “veil of ignorance”), and cannot change your choice even after you discover that, e.g., taking the Left box will kill you. In such a case, sure, you can say “look, it’s a gamble, and I did lose big this time, but it was a very favorable gamble, with a clearly positive expected outcome”. (Although see Robyn Dawes’ commentary on such “skewed” gambles. However, we can let this pass here.)
But that’s not the case here. Here, you’ve learned that taking the Left box kills you, but you still have a choice! You can still choose to take Right! And live!
Yes, FDT insists that actually, you must choose in advance (by “choosing your algorithm” or what have you), and must stick to the choice no matter what. But that is a feature of FDT, it is not a feature of the scenario! The scenario does not require that you stick to your choice. You’re free to take Right and live, no matter what your decision theory says.
So when selecting a decision theory, you may of course feel free to pick the one that says that you must pick Left, and knowingly burn to death, while I will pick the one that says that I can pick whatever I want. One of us will be dead, and the other will be “smiling from atop a heap of utility”.
(“But what about all those other possible worlds?”, you may ask. Well, by construction, I don’t find myself in any of those, so they’re irrelevant to my decision now, in the actual world.)
Yes, FDT insists that actually, you must choose in advance (by “choosing your algorithm” or what have you), and must stick to the choice no matter what. But that is a feature of FDT, it is not a feature of the scenario! The scenario does not require that you stick to your choice. You’re free to take Right and live, no matter what your decision theory says.
Well, I’d say FDT recognizes that you do choose in advance, because you are predictable. Apparently you have an algorithm running that makes these choices, and the predictor simulates that algorithm. It’s not that you “must” stick to your choice. It’s about constructing a theory that consistently recommends the actions that maximize expected utility.
I know I keep repeating that—but it seems that’s where our disagreement lies. You look at which action is best in a specific scenario, I look at what decision theory produces the most utility. An artificial superintelligence running a decision theory can’t choose freely no matter what the decision theory says: running the decision theory means doing what it says.
An artificial superintelligence running a decision theory can’t choose freely no matter what the decision theory says: running the decision theory means doing what it says.
That seems like an argument against “running a decision theory”, then!
Now, that statement may seem like it doesn’t make sense. I agree! But that’s because, as I see it, your view doesn’t make sense; what I just wrote is consistent with what you write…
Clearly, I, a human agent placed in the described scenario, could choose either Left or Right. Well, then we should design our AGI in such a way that it also has this same capability.
Obviously, the AGI will in fact (definitionally) be running some algorithm. But whatever algorithm that is, ought to be one that results in it being able to choose (and in fact choosing) Right in the “Bomb” scenario.
What decision theory does that correspond to? You tell me…
That seems like an argument against “running a decision theory”, then!
Now, that statement may seem like it doesn’t make sense. I agree! But that’s because, as I see it, your view doesn’t make sense; what I just wrote is consistent with what you write…
Exactly, it doesn’t make sense. It is in fact nonsense, unless you are saying it’s impossible to specify a coherent, utility-maximizing decision theory at all?
Btw, please explain how it’s consistent with what I wrote, because it seems obvious to me it’s not.
And if I select FDT, I would be the one “smiling from atop a heap of utility” in (10^24 − 1) out of 10^24 worlds.
But that’s not the case here. Here, you’ve learned that taking the Left box kills you, but you still have a choice! You can still choose to take Right! And live!
Yes, but the point is to construct a decision theory that recommends actions in a way that maximizes expected utility. Recommending left-boxing does that, because it saves you $100 in virtually every world. That’s it, really. You keep focusing on that 1 out of 10^24 possibility were you burn to death, but that doesn’t take anything away from FDT. Like I said: it’s not about which action to take, let alone which action in such an improbable scenario. It’s about what decision theory we need.
And if I select FDT, I would be the one “smiling from atop a heap of utility” in (10^24 − 1) out of 10^24 worlds.
So you say. But in the scenario (and in any situation we actually find ourselves in), only the one, actual, world is available for inspection. In that actual world, I’m the one with the heap of utility, and you’re dead.
Who knows what I would do in any of those worlds, and what would happen as a result? Who knows what you would do?
In the given scenario, FDT loses, period, and loses really badly and, what is worse, loses in a completely avoidable manner.
You keep focusing on that 1 out of 10^24 possibility were you burn to death, but that doesn’t take anything away from FDT.
As I said, this reasoning makes sense if, at the time of your decision, you don’t know what possibility you will end up with (and are thus making a gamble). It makes no sense at all if you are deciding while in full possession of all relevant facts.
Like I said: it’s not about which action to take, let alone which action in such an improbable scenario. It’s about what decision theory we need.
Totally, and the decision theory we need is one that doesn’t make such terrible missteps!
Of course, it is possible to make an argument like: “yes, FDT fails badly in this improbable scenario, but all other available decision theories fail worse / more often, so the best thing to do is to go with FDT”. But that’s not the argument being made here—indeed, you’ve explicitly disclaimed it…
So you say. But in the scenario (and in any situation we actually find ourselves in), only the one, actual, world is available for inspection. In that actual world, I’m the one with the heap of utility, and you’re dead.
No. We can inspect more worlds. We know what happens given the agent’s choice and the predictor’s prediction. There are multiple paths, each with its own probability. The problem description focuses on that one world, yes. But the point remains—we need a decision theory, we need it to recommend an action (left-boxing or right-boxing), and left-boxing gives the most utility if we consider the bigger picture.
Totally, and the decision theory we need is one that doesn’t make such terrible missteps!
Do you agree that recommending left-boxing before the predictor makes its prediction is rational?
No. We can inspect more worlds. We know what happens given the agent’s choice and the predictor’s prediction.
Well, no. We can reason about more worlds. But we can’t actually inspect them.
Here’s the question I have, though, which I have yet to see a good answer to. You say:
But the point remains—we need a decision theory, we need it to recommend an action (left-boxing or right-boxing), and left-boxing gives the most utility if we consider the bigger picture.
But why can’t our decision theory recommend “choose Left if and only if it contains no bomb; otherwise choose Right”? (Remember, the boxes are open; we can see what’s in there…)
Do you agree that recommending left-boxing before the predictor makes its prediction is rational?
I think that recommending no-bomb-boxing is rational. Or, like: “Take the left box, unless of course the predictor made a mistake and put a bomb in there, in which case, of course, take the right box.”
As to inspection, maybe I’m not familiar enough with the terminology there.
Re your last point: I was just thinking about that too. And strangely enough I missed that the boxes are open. But wouldn’t the note be useless in that case?
I will think about this more, but it seems to me your decision theory can’t recommend “Left-box, unless you see a bomb in left.”, and FDT doesn’t do this. The problem is, in that case the prediction influences what you end up doing. What if the predictor is malevolent, and predicts you choose right, placing the bomb in left? It could make you lose $100 easily. Maybe if you believed the predictor to be benevolent?
And strangely enough I missed that the boxes are open.
Well, uh… that is rather an important aspect of the scenario…
… it seems to me your decision theory can’t recommend “Left-box, unless you see a bomb in left.” …
Why not?
The problem is, in that case the prediction influences what you end up doing.
Yes, it certainly does. And that’s a problem for the predictor, perhaps, but why should it be a problem for me? People condition their actions on knowledge of past events (including predictions of their actions!) all the time.
What if the predictor is malevolent, and predicts you choose right, placing the bomb in left? It could make you lose $100 easily.
Indeed, the predictor doesn’t have to predict anything to make me lose $100; it can just place the bomb in the left box, period. This then boils down to a simple threat: “pay $100 or die!”. Hardly a tricky decision theory problem…
Well, uh… that is rather an important aspect of the scenario…
Sure. But given the note, I had the knowledge needed already, it seems. But whatever.
Indeed, the predictor doesn’t have to predict anything to make me lose $100; it can just place the bomb in the left box, period. This then boils down to a simple threat: “pay $100 or die!”. Hardly a tricky decision theory problem…
Didn’t say it was a tricky decision problem. My point was that your strategy is easily exploitable and may therefore not be a good strategy.
If your strategy is “always choose Left”, then a malevolent “predictor” can put a bomb in Left and be guaranteed to kill you. That seems much worse than being mugged for $100.
I don’t see how that’s relevant. In the original problem, you’ve been placed in this weird situation against your will, where something bad will happen to you (either the loss of $100 or … death). If we’re supposing that the predictor is malevolent, she could certainly do all sorts of things… are we assuming that the predictor is constrained in some way? Clearly, she can make mistakes, so that opens up her options to any kind of thing you like. In any case, your choice (by construction) is as stated: pay $100, or die.
Yes, FDT insists that actually, you must choose in advance (by “choosing your algorithm” or what have you), and must stick to the choice no matter what. But that is a feature of FDT, it is not a feature of the scenario!
FDT doesn’t insist on this at all. FDT recognizes that IF your decision procedure is modelled prior to your current decision, than you did in fact choose in advance. If an FDT’er playing Bomb doesn’t believe her decision procedure was being modelled this way, she wouldn’t take Left!
If and only if it is a feature of the scenario, then FDT recognizes it. FDT isn’t insisting the world to be a certain way. I wouldn’t be a proponent of it if it did.
If a model of you predicts that you will choose A, but in fact you can choose B, and want to choose B, and do choose B, then clearly the model was wrong. Thinking “the model says I will choose A, therefore I have to (???) choose A” is total nonsense.
(Is there some other way to interpret what you’re saying? I don’t see it.)
“Thinking “the model says I will choose A, therefore I have to (???) choose A” is total nonsense.”
I choose whatever I want, knowing that it means the predictor predicted that choice.
In Bomb, if I choose Left, the predictor will have predicted that (given subjunctive dependence). Yes, the predictor said it predicted Right in the problem description; but if I choose Left, that simply means the problem ran differently from the start. It means, starting from the beginning, the predictor predicts I will choose Left, doesn’t put a bomb in Left, doesn’t leave the “I predicted you will pick Right”-note (but maybe leaves a “I predicted you will pick Left”-note) , and then I indeed choose Left, letting me live for free.
If the model is in fact (near) perfect, then choosing B means the model chose B too. That may seem like changing the past, but it really isn’t, that’s just the confusing way these problems are set up.
Claiming you can choose something a (near) perfect model of you didn’t predict is like claiming two identical calculators can give a different answer to 2 + 2.
This sort of reasoning makes sense if you must decide on which box to take prior to learning the details of your situation (a.k.a. in a “veil of ignorance”), and cannot change your choice even after you discover that, e.g., taking the Left box will kill you. In such a case, sure, you can say “look, it’s a gamble, and I did lose big this time, but it was a very favorable gamble, with a clearly positive expected outcome”. (Although see Robyn Dawes’ commentary on such “skewed” gambles. However, we can let this pass here.)
But that’s not the case here.
It is the case, in way. Otherwise the predictor could not have predicted your action. I’m not saying you actively decide what to do beforehand, but apparently you are running a predictable decision procedure.
This sort of reasoning makes sense if you must decide on which box to take prior to learning the details of your situation (a.k.a. in a “veil of ignorance”), and cannot change your choice even after you discover that, e.g., taking the Left box will kill you. In such a case, sure, you can say “look, it’s a gamble, and I did lose big this time, but it was a very favorable gamble, with a clearly positive expected outcome”. (Although see Robyn Dawes’ commentary on such “skewed” gambles. However, we can let this pass here.)
But that’s not the case here. Here, you’ve learned that taking the Left box kills you, but you still have a choice! You can still choose to take Right! And live!
Yes, FDT insists that actually, you must choose in advance (by “choosing your algorithm” or what have you), and must stick to the choice no matter what. But that is a feature of FDT, it is not a feature of the scenario! The scenario does not require that you stick to your choice. You’re free to take Right and live, no matter what your decision theory says.
So when selecting a decision theory, you may of course feel free to pick the one that says that you must pick Left, and knowingly burn to death, while I will pick the one that says that I can pick whatever I want. One of us will be dead, and the other will be “smiling from atop a heap of utility”.
(“But what about all those other possible worlds?”, you may ask. Well, by construction, I don’t find myself in any of those, so they’re irrelevant to my decision now, in the actual world.)
Well, I’d say FDT recognizes that you do choose in advance, because you are predictable. Apparently you have an algorithm running that makes these choices, and the predictor simulates that algorithm. It’s not that you “must” stick to your choice. It’s about constructing a theory that consistently recommends the actions that maximize expected utility.
I know I keep repeating that—but it seems that’s where our disagreement lies. You look at which action is best in a specific scenario, I look at what decision theory produces the most utility. An artificial superintelligence running a decision theory can’t choose freely no matter what the decision theory says: running the decision theory means doing what it says.
That seems like an argument against “running a decision theory”, then!
Now, that statement may seem like it doesn’t make sense. I agree! But that’s because, as I see it, your view doesn’t make sense; what I just wrote is consistent with what you write…
Clearly, I, a human agent placed in the described scenario, could choose either Left or Right. Well, then we should design our AGI in such a way that it also has this same capability.
Obviously, the AGI will in fact (definitionally) be running some algorithm. But whatever algorithm that is, ought to be one that results in it being able to choose (and in fact choosing) Right in the “Bomb” scenario.
What decision theory does that correspond to? You tell me…
CDT
CDT indeed Right-boxes, thereby losing utility.
Exactly, it doesn’t make sense. It is in fact nonsense, unless you are saying it’s impossible to specify a coherent, utility-maximizing decision theory at all?
Btw, please explain how it’s consistent with what I wrote, because it seems obvious to me it’s not.
And if I select FDT, I would be the one “smiling from atop a heap of utility” in (10^24 − 1) out of 10^24 worlds.
Yes, but the point is to construct a decision theory that recommends actions in a way that maximizes expected utility. Recommending left-boxing does that, because it saves you $100 in virtually every world. That’s it, really. You keep focusing on that 1 out of 10^24 possibility were you burn to death, but that doesn’t take anything away from FDT. Like I said: it’s not about which action to take, let alone which action in such an improbable scenario. It’s about what decision theory we need.
So you say. But in the scenario (and in any situation we actually find ourselves in), only the one, actual, world is available for inspection. In that actual world, I’m the one with the heap of utility, and you’re dead.
Who knows what I would do in any of those worlds, and what would happen as a result? Who knows what you would do?
In the given scenario, FDT loses, period, and loses really badly and, what is worse, loses in a completely avoidable manner.
As I said, this reasoning makes sense if, at the time of your decision, you don’t know what possibility you will end up with (and are thus making a gamble). It makes no sense at all if you are deciding while in full possession of all relevant facts.
Totally, and the decision theory we need is one that doesn’t make such terrible missteps!
Of course, it is possible to make an argument like: “yes, FDT fails badly in this improbable scenario, but all other available decision theories fail worse / more often, so the best thing to do is to go with FDT”. But that’s not the argument being made here—indeed, you’ve explicitly disclaimed it…
No. We can inspect more worlds. We know what happens given the agent’s choice and the predictor’s prediction. There are multiple paths, each with its own probability. The problem description focuses on that one world, yes. But the point remains—we need a decision theory, we need it to recommend an action (left-boxing or right-boxing), and left-boxing gives the most utility if we consider the bigger picture.
Do you agree that recommending left-boxing before the predictor makes its prediction is rational?
Well, no. We can reason about more worlds. But we can’t actually inspect them.
Here’s the question I have, though, which I have yet to see a good answer to. You say:
But why can’t our decision theory recommend “choose Left if and only if it contains no bomb; otherwise choose Right”? (Remember, the boxes are open; we can see what’s in there…)
I think that recommending no-bomb-boxing is rational. Or, like: “Take the left box, unless of course the predictor made a mistake and put a bomb in there, in which case, of course, take the right box.”
As to inspection, maybe I’m not familiar enough with the terminology there.
Re your last point: I was just thinking about that too. And strangely enough I missed that the boxes are open. But wouldn’t the note be useless in that case?
I will think about this more, but it seems to me your decision theory can’t recommend “Left-box, unless you see a bomb in left.”, and FDT doesn’t do this. The problem is, in that case the prediction influences what you end up doing. What if the predictor is malevolent, and predicts you choose right, placing the bomb in left? It could make you lose $100 easily. Maybe if you believed the predictor to be benevolent?
Well, uh… that is rather an important aspect of the scenario…
Why not?
Yes, it certainly does. And that’s a problem for the predictor, perhaps, but why should it be a problem for me? People condition their actions on knowledge of past events (including predictions of their actions!) all the time.
Indeed, the predictor doesn’t have to predict anything to make me lose $100; it can just place the bomb in the left box, period. This then boils down to a simple threat: “pay $100 or die!”. Hardly a tricky decision theory problem…
Sure. But given the note, I had the knowledge needed already, it seems. But whatever.
Didn’t say it was a tricky decision problem. My point was that your strategy is easily exploitable and may therefore not be a good strategy.
If your strategy is “always choose Left”, then a malevolent “predictor” can put a bomb in Left and be guaranteed to kill you. That seems much worse than being mugged for $100.
The problem description explicitly states the predictor doesn’t do that, so no.
I don’t see how that’s relevant. In the original problem, you’ve been placed in this weird situation against your will, where something bad will happen to you (either the loss of $100 or … death). If we’re supposing that the predictor is malevolent, she could certainly do all sorts of things… are we assuming that the predictor is constrained in some way? Clearly, she can make mistakes, so that opens up her options to any kind of thing you like. In any case, your choice (by construction) is as stated: pay $100, or die.
You don’t see how the problem description preventing it is relevant?
The description doesn’t prevent malevolence, but it does prevent putting a bomb in left if the agent left-boxes.
FDT doesn’t insist on this at all. FDT recognizes that IF your decision procedure is modelled prior to your current decision, than you did in fact choose in advance. If an FDT’er playing Bomb doesn’t believe her decision procedure was being modelled this way, she wouldn’t take Left!
If and only if it is a feature of the scenario, then FDT recognizes it. FDT isn’t insisting the world to be a certain way. I wouldn’t be a proponent of it if it did.
If a model of you predicts that you will choose A, but in fact you can choose B, and want to choose B, and do choose B, then clearly the model was wrong. Thinking “the model says I will choose A, therefore I have to (???) choose A” is total nonsense.
(Is there some other way to interpret what you’re saying? I don’t see it.)
“Thinking “the model says I will choose A, therefore I have to (???) choose A” is total nonsense.”
I choose whatever I want, knowing that it means the predictor predicted that choice.
In Bomb, if I choose Left, the predictor will have predicted that (given subjunctive dependence). Yes, the predictor said it predicted Right in the problem description; but if I choose Left, that simply means the problem ran differently from the start. It means, starting from the beginning, the predictor predicts I will choose Left, doesn’t put a bomb in Left, doesn’t leave the “I predicted you will pick Right”-note (but maybe leaves a “I predicted you will pick Left”-note) , and then I indeed choose Left, letting me live for free.
If the model is in fact (near) perfect, then choosing B means the model chose B too. That may seem like changing the past, but it really isn’t, that’s just the confusing way these problems are set up.
Claiming you can choose something a (near) perfect model of you didn’t predict is like claiming two identical calculators can give a different answer to 2 + 2.
It is the case, in way. Otherwise the predictor could not have predicted your action. I’m not saying you actively decide what to do beforehand, but apparently you are running a predictable decision procedure.