Not sure about FDT (fancy decision theories) but there are only two possible outcomes here:
Paul lives in a world with psychopaths,
The world has no psychopaths, including no Paul.
There is no possible world where Paul lives and all psychopaths are dead, so “be much better to live in a world with no psychopaths” is an extraneous preference having no bearing on whether to press the button. Sort of like “it would be nice to live in a world with flying cars looking like unicorns”.
The real question is which of the two possible worlds Paul prefers, and the answer is quite clear: if Paul is a psychopath he strongly prefers living to dying, so no pressing the button, and if Paul is not a psychopath, he will not press the button anyway. There is nothing fancy that needs a decision theory here, just count the possible worlds.
No, Paul can be wrong about only psychopaths pushing the button.
How confident is Paul about that? Oh right, “quite”. Is that a credence of 80%? 95% 99.99%?
How much more strongly does Paul prefer living in a world with psychopaths to dying? Oh right, “very”. Is that a utility of 2x? 100x? 10000x?
What is Paul’s prior credence that he is a psychopath according to the button’s implementation? 0.1%? 1%? 5%? 50%?
… and so on for other variables that are required for every logical decision theory. In the original post it doesn’t make sense to ask what various logical decision theories answer when the question has only vague terms that are compatible with any answer.
Of course Paul could be wrong, and then you need to calculate probabilities, which is a trivial calculation that does not depend on a chosen decision theory. But the problem statement as is does not specify any of it, only that he is sure that only a psychopath would press the button, so take it as 100% confidence and 100% accuracy, for simplicity. The point does not change: you need a good specification of the problem, and once you have it, the calculation is evaluating probabilities of each world, multiplying by utilities, and declaring the agent that picks the world with the highest EV “rational”.
This looks like a point of view that denies value of two-boxing in Newcomb’s Problem, which shouldn’t interfere with remaining aware of what CDT would do and why, a useful thing for building saner variants of CDT.
Yes, there is no value in two-boxing because there is no possible world where a two-boxer wins (provided the predictor is perfect) or the probability of such a world falls off with improvement in predictor’s accuracy (when the predictor is imperfect). One doesn’t need a saner version of EDT or CDT, an agent who counts worlds, probabilities, and utilities, without involving counterfactuals, always has the best EV.
an agent who counts worlds, probabilities, and utilities, without involving counterfactuals, always has the best EV.
Sorry, can you express this in terms like V(A)=∑jP(Oj|A)U(Oj) ? The main disagreement between decision theories like EDT and CDT is which worlds they think are accessible, and I am not confident I could guess what you’d think the answer is to an arbitrary problem.
Basically, two-boxers equivocate between possible worlds and deny the problem statement that Predictor can predict them ahead of time, regardless of what they do later. They think that a low-probability world is accessible by jumping from a high probability world into a non-existent low-probability world after the boxes are set.
Cool, thanks for the link; I found jessicata’s comment thread there helpful.
I agree that CDT overestimates the accessibility of worlds. I think one way to think about EDT is that is also is just counting worlds, probabilities, and utilities, but you’re calculating your probabilities differently, in a more UDT-ish way.
Consider another variant of this problem, where there are many islands, and the button only kills the psychopaths on its island. If Paul has a historical record that so far, all of the previous buttons that have been pressed were pressed by psychopaths, Paul might nevertheless think that his choice to press the button stems from a different source than psychopathy, and thus it’s worth pressing the button. [Indeed, the spicy take is that EDT doesn’t press the button, CDT does for psychopathic reasons and so dies, and FDT does for non-psychopathic reasons, and so gets the best outcome. ;) ]
Yes, if Paul thinks that he might not be a psychopath who dies, and has a probability associated with it, he would include this possible world in the calculation… obviously? Though this requires further specification of how much he values his life vs life with/without psychopaths around. If he values it infinitely, as most psychopaths do, presumably, then he would not press the button, on an off chance that he is wrong. If the value is finite, then there is a break-even probability where he is indifferent to pressing the button. I don’t understand how it is related to a decision theory, it’s just world counting and EV calculation. I must be missing something, I assume.
Agreed that we need real-valued utilities to make clear recommendations in the case of uncertainty.
I don’t understand how it is related to a decision theory, it’s just world counting and EV calculation. I must be missing something, I assume.
For all of the consequentialist decision theories, I think you can describe what they’re doing as attempting to argmax a probability-weighted sum of utilities across possible worlds, and they differ on how they think actions influence probabilities / their underlying theory of how they specify ‘possible worlds’ and thus what universe they think they’re in. [That is, I think the interesting bit is the part you seem to be handling as an implementation detail.]
Incidentally, this is an increasinglydubiousobjective. But to see why it’s a bad idea in practice, it’s helpful to be aware of the way it looks like a very good idea. (Regardless, it’s obviously relevant for this post.)
OK, I read the last one (again, after all these years), and I have no idea how it is applicable. It seems to be about the definition of probability, dutch-booking and such… nothing to do with the question at hand. The one before that is about how a “wrapper-mind”, i.e. a fixed-goal AGI is bad… Which is indeed correct, but… irrelevant? It has the best EV by its own metric?
(The second paragraph was irrelevant to the comment I was replying to, I thought the “incidentally”, and the inverted-in-context “it’s obviously relevant” (it’s maximization of EV that’s obviously relevant, unlike the objections to it I’m voicing; maybe this was misleading) made that framing clear?)
I was commenting on how “having the best EV”, the classical dream of decision theory, is recently in question because of the Goodhart’s Curse issue. That it might be good to look for decision theories that do something else. The wrapper-minds post is pointing at the same problem from a very different framing. Mild optimization is a sketch of the kind of thing that might make it better, and includes more specific suggestions like quantilization. (I currently like “moral updatelessness” for this role, a variant of UDT that bargains from a position of moral ignorance, not just epistemic ignorance, among its more morally competent successors, with mutually counterfactual, that is discordant, but more developed moralities/values/goals.) The “coherent decisions” post is just a handy reference for why EV maximization is the standard go-to thing, and might still remain as such in the limit of reflection (time), but possibly not even then.
The relevant part (to the “saner CDT” point) is the first paragraph, which is mostly about Troll Bridge and logical decision theory. Last post of the sequence has a summary/retrospective. Personally, I mostly like CDT for introducing surgery, fictional laws-of-physics-defying counterfactuals seem inescapable in some framings that are not just being dumb like vanilla CDT. In particular, when considering interventions through approximate predictions of the agent. (How do you set all of these to some possible decision, when all you know is the real world, which might have the actual decision you didn’t make yet in its approximate models of you? You might need to “lie” in the counterfactual with fictional details to make models of your behavior created by others predict what you are considering doing, instead of what you actually do and can’t predict or infer from actual models they’ve already made of you. Similarly to how you know a Chess AI will win, without knowing how, you know that models of your behavior will predict it, without knowing how. So you are not inferring their predictions from their details, you are just editing them in into a counterfactual.) This might even be relevant to CEV in that moral updatelessness setting I’ve mentioned, though that’s pure speculation at this point.
a fixed-goal AGI is bad… Which is indeed correct, but… irrelevant? It has the best EV by its own metric?
Nobody knows how to formulate it like that! EV maximization is so entrenched as obviously the thing to do that the “obviously, it’s just EV maximization for something else” response is instinctual, but that doesn’t seem to be the case.
And if maximization is always cursed (goals are always proxy goals, even as they become increasingly more accurate, particularly around the actual environment), it’s not maximization that decision theory should be concerned with.
Thanks. I will give them a read. After all, smarter people than me spent more time than I did thinking about this. There is a fair chance that I am missing something.
Not sure about FDT (fancy decision theories) but there are only two possible outcomes here:
Paul lives in a world with psychopaths,
The world has no psychopaths, including no Paul.
There is no possible world where Paul lives and all psychopaths are dead, so “be much better to live in a world with no psychopaths” is an extraneous preference having no bearing on whether to press the button. Sort of like “it would be nice to live in a world with flying cars looking like unicorns”.
The real question is which of the two possible worlds Paul prefers, and the answer is quite clear: if Paul is a psychopath he strongly prefers living to dying, so no pressing the button, and if Paul is not a psychopath, he will not press the button anyway. There is nothing fancy that needs a decision theory here, just count the possible worlds.
No, Paul can be wrong about only psychopaths pushing the button.
How confident is Paul about that? Oh right, “quite”. Is that a credence of 80%? 95% 99.99%?
How much more strongly does Paul prefer living in a world with psychopaths to dying? Oh right, “very”. Is that a utility of 2x? 100x? 10000x?
What is Paul’s prior credence that he is a psychopath according to the button’s implementation? 0.1%? 1%? 5%? 50%?
… and so on for other variables that are required for every logical decision theory. In the original post it doesn’t make sense to ask what various logical decision theories answer when the question has only vague terms that are compatible with any answer.
Of course Paul could be wrong, and then you need to calculate probabilities, which is a trivial calculation that does not depend on a chosen decision theory. But the problem statement as is does not specify any of it, only that he is sure that only a psychopath would press the button, so take it as 100% confidence and 100% accuracy, for simplicity. The point does not change: you need a good specification of the problem, and once you have it, the calculation is evaluating probabilities of each world, multiplying by utilities, and declaring the agent that picks the world with the highest EV “rational”.
This looks like a point of view that denies value of two-boxing in Newcomb’s Problem, which shouldn’t interfere with remaining aware of what CDT would do and why, a useful thing for building saner variants of CDT.
Yes, there is no value in two-boxing because there is no possible world where a two-boxer wins (provided the predictor is perfect) or the probability of such a world falls off with improvement in predictor’s accuracy (when the predictor is imperfect). One doesn’t need a saner version of EDT or CDT, an agent who counts worlds, probabilities, and utilities, without involving counterfactuals, always has the best EV.
Sorry, can you express this in terms like V(A)=∑jP(Oj|A)U(Oj) ? The main disagreement between decision theories like EDT and CDT is which worlds they think are accessible, and I am not confident I could guess what you’d think the answer is to an arbitrary problem.
I tried in my old post https://www.lesswrong.com/posts/TQvSZ4n4BuntC22Af/decisions-are-not-about-changing-the-world-they-are-about.
Basically, two-boxers equivocate between possible worlds and deny the problem statement that Predictor can predict them ahead of time, regardless of what they do later. They think that a low-probability world is accessible by jumping from a high probability world into a non-existent low-probability world after the boxes are set.
Cool, thanks for the link; I found jessicata’s comment thread there helpful.
I agree that CDT overestimates the accessibility of worlds. I think one way to think about EDT is that is also is just counting worlds, probabilities, and utilities, but you’re calculating your probabilities differently, in a more UDT-ish way.
Consider another variant of this problem, where there are many islands, and the button only kills the psychopaths on its island. If Paul has a historical record that so far, all of the previous buttons that have been pressed were pressed by psychopaths, Paul might nevertheless think that his choice to press the button stems from a different source than psychopathy, and thus it’s worth pressing the button. [Indeed, the spicy take is that EDT doesn’t press the button, CDT does for psychopathic reasons and so dies, and FDT does for non-psychopathic reasons, and so gets the best outcome. ;) ]
Yes, if Paul thinks that he might not be a psychopath who dies, and has a probability associated with it, he would include this possible world in the calculation… obviously? Though this requires further specification of how much he values his life vs life with/without psychopaths around. If he values it infinitely, as most psychopaths do, presumably, then he would not press the button, on an off chance that he is wrong. If the value is finite, then there is a break-even probability where he is indifferent to pressing the button. I don’t understand how it is related to a decision theory, it’s just world counting and EV calculation. I must be missing something, I assume.
Agreed that we need real-valued utilities to make clear recommendations in the case of uncertainty.
For all of the consequentialist decision theories, I think you can describe what they’re doing as attempting to argmax a probability-weighted sum of utilities across possible worlds, and they differ on how they think actions influence probabilities / their underlying theory of how they specify ‘possible worlds’ and thus what universe they think they’re in. [That is, I think the interesting bit is the part you seem to be handling as an implementation detail.]
That’s not clear until you develop them.
Incidentally, this is an increasingly dubious objective. But to see why it’s a bad idea in practice, it’s helpful to be aware of the way it looks like a very good idea. (Regardless, it’s obviously relevant for this post.)
OK, I read the last one (again, after all these years), and I have no idea how it is applicable. It seems to be about the definition of probability, dutch-booking and such… nothing to do with the question at hand. The one before that is about how a “wrapper-mind”, i.e. a fixed-goal AGI is bad… Which is indeed correct, but… irrelevant? It has the best EV by its own metric?
(The second paragraph was irrelevant to the comment I was replying to, I thought the “incidentally”, and the inverted-in-context “it’s obviously relevant” (it’s maximization of EV that’s obviously relevant, unlike the objections to it I’m voicing; maybe this was misleading) made that framing clear?)
I was commenting on how “having the best EV”, the classical dream of decision theory, is recently in question because of the Goodhart’s Curse issue. That it might be good to look for decision theories that do something else. The wrapper-minds post is pointing at the same problem from a very different framing. Mild optimization is a sketch of the kind of thing that might make it better, and includes more specific suggestions like quantilization. (I currently like “moral updatelessness” for this role, a variant of UDT that bargains from a position of moral ignorance, not just epistemic ignorance, among its more morally competent successors, with mutually counterfactual, that is discordant, but more developed moralities/values/goals.) The “coherent decisions” post is just a handy reference for why EV maximization is the standard go-to thing, and might still remain as such in the limit of reflection (time), but possibly not even then.
The relevant part (to the “saner CDT” point) is the first paragraph, which is mostly about Troll Bridge and logical decision theory. Last post of the sequence has a summary/retrospective. Personally, I mostly like CDT for introducing surgery, fictional laws-of-physics-defying counterfactuals seem inescapable in some framings that are not just being dumb like vanilla CDT. In particular, when considering interventions through approximate predictions of the agent. (How do you set all of these to some possible decision, when all you know is the real world, which might have the actual decision you didn’t make yet in its approximate models of you? You might need to “lie” in the counterfactual with fictional details to make models of your behavior created by others predict what you are considering doing, instead of what you actually do and can’t predict or infer from actual models they’ve already made of you. Similarly to how you know a Chess AI will win, without knowing how, you know that models of your behavior will predict it, without knowing how. So you are not inferring their predictions from their details, you are just editing them in into a counterfactual.) This might even be relevant to CEV in that moral updatelessness setting I’ve mentioned, though that’s pure speculation at this point.
Nobody knows how to formulate it like that! EV maximization is so entrenched as obviously the thing to do that the “obviously, it’s just EV maximization for something else” response is instinctual, but that doesn’t seem to be the case.
And if maximization is always cursed (goals are always proxy goals, even as they become increasingly more accurate, particularly around the actual environment), it’s not maximization that decision theory should be concerned with.
Thanks. I will give them a read. After all, smarter people than me spent more time than I did thinking about this. There is a fair chance that I am missing something.