It’s obvious how ordinary conditionals are important for planning and acting (you design a bridge so that it won’t fall down if someone drives a heavy lorry across it; you don’t cross a bridge because you think the troll underneath will eat you if you cross), but counterfactuals? I mean, obviously you can put them in to a particular problem
All the various reasoning behind a decision could involve material conditionals, probabilistic conditionals, logical implication, linguistic conditionals (whatever those are), linguistic counterfactuals, decision-theoretic counterfactuals (if those are indeed different as I claim), etc etc etc. I’m not trying to make the broad claim that counterfactuals are somehow involved.
The claim is about the decision algorithm itself. The claim is that the way we choose an action is by evaluating a counterfactual (“what happens if I take this action?”). Or, to be a little more psychologically realistic, the cashed values which determine which actions we take are estimated counterfactual values.
What is the content of this claim?
A decision procedure is going to have (cashed-or-calculated) value estimates which it uses to make decisions. (At least, most decision procedures work that way.) So the content of the claim is about the nature of these values.
If the values act like Bayesian conditional expectations, then the claim that we need counterfactuals to make decisions is considered false. This is the claim of evidential decision theory (EDT).
If the values are still well-defined for known-false actions, then they’re counterfactual. So, a fundamental reason why MIRI-type decision theory uses counterfactuals is to deal with the case of known-false actions.
However, academic decision theorists have used (causal) counterfactuals for completely different reasons (IE because they supposedly give better answers). This is the claim of causal decision theory (CDT).
My claim in the post, of course, is that the estimated values used to make decisions should match the EDT expected values almost all of the time, but, should not be responsive to the same kinds of reasoning which the EDT values are responsive to, so should not actually be evidential.
Could you give a couple of examples where counterfactuals are relevant to planning and acting without having been artificially inserted?
It sounds like you’ve kept a really strong assumption of EDT in your head; so strong that you couldn’t even imagine why non-evidential reasoning might be part of an agent’s decision procedure. My example is the troll bridge: conditional reasoning (whether proof-based or expectation-based) ends up not crossing the bridge, where counterfactual reasoning can cross (if we get the counterfactuals right).
The thing you call “proof-based decision theory” involves trying to prove things of the form “if I do X, I will get at least Y utility” but those look like ordinary conditionals rather than counterfactuals to me too.
Right. In the post, I argue that using proofs like this is more like a form of EDT rather than CDT, so, I’m more comfortable calling this “conditional reasoning” (lumping it in with probabilistic conditionals).
The Troll Bridge is supposed to show a flaw in this kind of reasoning, suggesting that we need counterfactual reasoning instead (at least, if “counterfactual” is broadly understood to be anything other than conditional reasoning—a simplification which mostly makes sense in practice).
though this is pure prejudice and maybe there are better reasons for it than I can currently imagine: we want agents that can act in the actual world, about which one can generally prove precisely nothing of interest
Oh, yeah, proof-based agents can technically do anything which regular expectation-based agents can do. Just take the probabilistic model the expectation-based agents are using, and then have the proof-based agent take the action for which it can prove the highest expectation. This isn’t totally slight of hand; the proof-based agent will still display some interesting behavior if it is playing games with other proof-based agents, dealing with Omega, etc.
At any rate, right now “passing Troll Bridge” looks to me like a problem applicable only to a very specific kind of decision-making agent, one I don’t see any particular reason to think has any prospect of ever being relevant to decision-making in the actual world—but I am extremely aware that this may be purely a reflection of my own ignorance.
Even if proof-based decision theory didn’t generalize to handle uncertain reasoning, the troll bridge would also apply to expectation-based reasoners if their expectations respect logic. So the narrow class of agents for whome it makes sense to ask “does this agent pass the troll bridge” are basically agents who use logic at all, not just agents who are ristricted to pure logic with no probabilistic belief.
OK, I get it. (Or at least I think I do.) And, duh, indeed it turns out (as you were too polite to say in so many words) that I was distinctly confused.
So: Using ordinary conditionals in planning your actions commits you to reasoning like “If (here in the actual world it turns out that) I choose to smoke this cigarette, then that makes it more likely that I have the weird genetic anomaly that causes both desire-to-smoke and lung cancer, so I’m more likely to die prematurely and horribly of lung cancer, so I shouldn’t smoke it”, which makes wrong decisions. So you want to use some sort of conditional that doesn’t work that way and rather says something more like “suppose everything about the world up to now is exactly as it is in the actual world, but magically-but-without-the-existence-of-magic-having-consequences I decide to do X; what then?”. And this is what you’re calling decision-theoretic counterfactuals, and the question is exactly what they should be; EDT says no, just use ordinary conditionals, CDT says pretty much what I just said, etc. The “smoking lesion” shows that EDT can give implausible results; “Death in Damascus” shows that CDT can give implausible results; etc.
All of which I really should have remembered, since it’s all stuff I have known in the past, but I am a doofus. My apologies.
(But my error wasn’t being too mired in EDT, or at least I don’t think it was; I think EDT is wrong. My error was having the term “counterfactual” too strongly tied in my head to what you call linguistic counterfactuals. Plus not thinking clearly about any of the actual decision theory.)
It still feels to me as if your proof-based agents are unrealistically narrow. Sure, they can incorporate whatever beliefs they have about the real world as axioms for their proofs—but only if those axioms end up being consistent, which means having perfectly consistent beliefs. The beliefs may of course be probabilistic, but then that means that all those beliefs have to have perfectly consistent probabilities assigned to them. Do you really think it’s plausible that an agent capable of doing real things in the real world can have perfectly consistent beliefs in this fashion? (I am pretty sure, for instance, that no human being has perfectly consistent beliefs; if any of us tried to do what your proof-based agents are doing, we would arrive at a contradiction—or fail to do so only because we weren’t trying hard enough.) I think “agents that use logic at all on the basis of beliefs about the world that are perfectly internally consistent” is a much narrower class than “agents that use logic at all”.
(That probably sounds like a criticism, but once again I am extremely aware that it may be that this feels implausible to me only because I am lacking important context, or confused about important things. After all, that was the case last time around. So my question is more “help me resolve my confusion” than “let me point out to you how the stuff you’ve been studying for ages is wrongheaded”, and I appreciate that you may have other more valuable things to do with your time than help to resolve my confusion :-).)
All of which I really should have remembered, since it’s all stuff I have known in the past, but I am a doofus. My apologies.
(But my error wasn’t being too mired in EDT, or at least I don’t think it was; I think EDT is wrong. My error was having the term “counterfactual” too strongly tied in my head to what you call linguistic counterfactuals. Plus not thinking clearly about any of the actual decision theory.)
I’m glad I pointed out the difference between linguistic and DT counterfactuals, then!
It still feels to me as if your proof-based agents are unrealistically narrow. Sure, they can incorporate whatever beliefs they have about the real world as axioms for their proofs—but only if those axioms end up being consistent, which means having perfectly consistent beliefs. The beliefs may of course be probabilistic, but then that means that all those beliefs have to have perfectly consistent probabilities assigned to them. Do you really think it’s plausible that an agent capable of doing real things in the real world can have perfectly consistent beliefs in this fashion?
I’m not at all suggesting that we use proof-based DT in this way. It’s just a model. I claim that it’s a pretty good model—that we can often carry over results to other, more complex, decision theories.
However, if we wanted to, then yes, I think we could… I agree that if we add beliefs as axioms, the axioms have to be perfectly consistent. But if we use probabilistic beliefs, those probabilities don’t have to be perfectly consistent; just the axioms saying which probabilities we have. So, for example, I could use a proof-based agent to approximate a logical-induction-based agent, by looking for proofs about what the market expectations are. This would be kind of convoluted, though.
I appreciate that it’s a model, but it seems—perhaps wrongly, since as already mentioned I am an ignorant doofus—as if at least some of what you’re doing with the model depends essentially on the strictly-logic-based nature of the agent. (E.g., the Troll Bridge problem as stated here seems that way, down to applying Löb’s theorem to the agent as formal system.)
Formal logic is very brittle; ex falso quodlibet, and all that; it (ignorantly and doofusily) looks to me as if you might be looking at a certain class of models and then finding problems (e.g., Troll Bridge) that are only problems because of specific features of the models that couldn’t realistically apply to the real world.
(In terms of the “rocket alignment problem” metaphor: suppose you start thinking about orbital mechanics, come up with exact-conic-section orbits as an interesting class of things to study, and prove some theorem that says that some class of things isn’t achievable for exact-conic-section orbits for a reason that comes down to something like dividing by the sum of squares of all the higher-order terms that are exactly zero for a perfect conic section orbit. That would be an interesting theorem, and it’s not hard to imagine how some less-rigid generalization of it might apply to real trajectories (“if the sum of squares of those coefficients is small then the trajectory is unstable and hard to get right” or something) -- but as it stands it doesn’t really tell you anything about real problems faced by real rockets whose trajectories are not perfect conic sections. And logic is generally much more brittle than orbital mechanics, chaos theory notwithstanding; there isn’t generally anything that corresponds to the sum of squares of coefficients being small; a proof that contains only a few small errors is not a proof at all.)
But, hmm, you reckon one could make a viable proof-based agent that has a consistent set of axioms describing a potentially-inconsistent set of probabilities. That’s an intriguing idea but I’m having trouble seeing how it would work. E.g., suppose I’m searching for proofs that my expected utility if I do X is at least Y units; that expectation obviously involves a whole lot of probabilities, and my actual probability assignments are inconsistent in various ways. How do I prove anything about my expected utilities if the probabilities involved might be inconsistent?
This is all a bit off topic on this particular post; it’s not especially about your account of decision-theoretic counterfactuals as such, but about the whole project of understanding decision theory in terms of agents whose decision processes involve trying to prove things about their own behaviour.
All the various reasoning behind a decision could involve material conditionals, probabilistic conditionals, logical implication, linguistic conditionals (whatever those are), linguistic counterfactuals, decision-theoretic counterfactuals (if those are indeed different as I claim), etc etc etc. I’m not trying to make the broad claim that counterfactuals are somehow involved.
The claim is about the decision algorithm itself. The claim is that the way we choose an action is by evaluating a counterfactual (“what happens if I take this action?”). Or, to be a little more psychologically realistic, the cashed values which determine which actions we take are estimated counterfactual values.
What is the content of this claim?
A decision procedure is going to have (cashed-or-calculated) value estimates which it uses to make decisions. (At least, most decision procedures work that way.) So the content of the claim is about the nature of these values.
If the values act like Bayesian conditional expectations, then the claim that we need counterfactuals to make decisions is considered false. This is the claim of evidential decision theory (EDT).
If the values are still well-defined for known-false actions, then they’re counterfactual. So, a fundamental reason why MIRI-type decision theory uses counterfactuals is to deal with the case of known-false actions.
However, academic decision theorists have used (causal) counterfactuals for completely different reasons (IE because they supposedly give better answers). This is the claim of causal decision theory (CDT).
My claim in the post, of course, is that the estimated values used to make decisions should match the EDT expected values almost all of the time, but, should not be responsive to the same kinds of reasoning which the EDT values are responsive to, so should not actually be evidential.
It sounds like you’ve kept a really strong assumption of EDT in your head; so strong that you couldn’t even imagine why non-evidential reasoning might be part of an agent’s decision procedure. My example is the troll bridge: conditional reasoning (whether proof-based or expectation-based) ends up not crossing the bridge, where counterfactual reasoning can cross (if we get the counterfactuals right).
Right. In the post, I argue that using proofs like this is more like a form of EDT rather than CDT, so, I’m more comfortable calling this “conditional reasoning” (lumping it in with probabilistic conditionals).
The Troll Bridge is supposed to show a flaw in this kind of reasoning, suggesting that we need counterfactual reasoning instead (at least, if “counterfactual” is broadly understood to be anything other than conditional reasoning—a simplification which mostly makes sense in practice).
Oh, yeah, proof-based agents can technically do anything which regular expectation-based agents can do. Just take the probabilistic model the expectation-based agents are using, and then have the proof-based agent take the action for which it can prove the highest expectation. This isn’t totally slight of hand; the proof-based agent will still display some interesting behavior if it is playing games with other proof-based agents, dealing with Omega, etc.
Even if proof-based decision theory didn’t generalize to handle uncertain reasoning, the troll bridge would also apply to expectation-based reasoners if their expectations respect logic. So the narrow class of agents for whome it makes sense to ask “does this agent pass the troll bridge” are basically agents who use logic at all, not just agents who are ristricted to pure logic with no probabilistic belief.
OK, I get it. (Or at least I think I do.) And, duh, indeed it turns out (as you were too polite to say in so many words) that I was distinctly confused.
So: Using ordinary conditionals in planning your actions commits you to reasoning like “If (here in the actual world it turns out that) I choose to smoke this cigarette, then that makes it more likely that I have the weird genetic anomaly that causes both desire-to-smoke and lung cancer, so I’m more likely to die prematurely and horribly of lung cancer, so I shouldn’t smoke it”, which makes wrong decisions. So you want to use some sort of conditional that doesn’t work that way and rather says something more like “suppose everything about the world up to now is exactly as it is in the actual world, but magically-but-without-the-existence-of-magic-having-consequences I decide to do X; what then?”. And this is what you’re calling decision-theoretic counterfactuals, and the question is exactly what they should be; EDT says no, just use ordinary conditionals, CDT says pretty much what I just said, etc. The “smoking lesion” shows that EDT can give implausible results; “Death in Damascus” shows that CDT can give implausible results; etc.
All of which I really should have remembered, since it’s all stuff I have known in the past, but I am a doofus. My apologies.
(But my error wasn’t being too mired in EDT, or at least I don’t think it was; I think EDT is wrong. My error was having the term “counterfactual” too strongly tied in my head to what you call linguistic counterfactuals. Plus not thinking clearly about any of the actual decision theory.)
It still feels to me as if your proof-based agents are unrealistically narrow. Sure, they can incorporate whatever beliefs they have about the real world as axioms for their proofs—but only if those axioms end up being consistent, which means having perfectly consistent beliefs. The beliefs may of course be probabilistic, but then that means that all those beliefs have to have perfectly consistent probabilities assigned to them. Do you really think it’s plausible that an agent capable of doing real things in the real world can have perfectly consistent beliefs in this fashion? (I am pretty sure, for instance, that no human being has perfectly consistent beliefs; if any of us tried to do what your proof-based agents are doing, we would arrive at a contradiction—or fail to do so only because we weren’t trying hard enough.) I think “agents that use logic at all on the basis of beliefs about the world that are perfectly internally consistent” is a much narrower class than “agents that use logic at all”.
(That probably sounds like a criticism, but once again I am extremely aware that it may be that this feels implausible to me only because I am lacking important context, or confused about important things. After all, that was the case last time around. So my question is more “help me resolve my confusion” than “let me point out to you how the stuff you’ve been studying for ages is wrongheaded”, and I appreciate that you may have other more valuable things to do with your time than help to resolve my confusion :-).)
I’m glad I pointed out the difference between linguistic and DT counterfactuals, then!
I’m not at all suggesting that we use proof-based DT in this way. It’s just a model. I claim that it’s a pretty good model—that we can often carry over results to other, more complex, decision theories.
However, if we wanted to, then yes, I think we could… I agree that if we add beliefs as axioms, the axioms have to be perfectly consistent. But if we use probabilistic beliefs, those probabilities don’t have to be perfectly consistent; just the axioms saying which probabilities we have. So, for example, I could use a proof-based agent to approximate a logical-induction-based agent, by looking for proofs about what the market expectations are. This would be kind of convoluted, though.
I appreciate that it’s a model, but it seems—perhaps wrongly, since as already mentioned I am an ignorant doofus—as if at least some of what you’re doing with the model depends essentially on the strictly-logic-based nature of the agent. (E.g., the Troll Bridge problem as stated here seems that way, down to applying Löb’s theorem to the agent as formal system.)
Formal logic is very brittle; ex falso quodlibet, and all that; it (ignorantly and doofusily) looks to me as if you might be looking at a certain class of models and then finding problems (e.g., Troll Bridge) that are only problems because of specific features of the models that couldn’t realistically apply to the real world.
(In terms of the “rocket alignment problem” metaphor: suppose you start thinking about orbital mechanics, come up with exact-conic-section orbits as an interesting class of things to study, and prove some theorem that says that some class of things isn’t achievable for exact-conic-section orbits for a reason that comes down to something like dividing by the sum of squares of all the higher-order terms that are exactly zero for a perfect conic section orbit. That would be an interesting theorem, and it’s not hard to imagine how some less-rigid generalization of it might apply to real trajectories (“if the sum of squares of those coefficients is small then the trajectory is unstable and hard to get right” or something) -- but as it stands it doesn’t really tell you anything about real problems faced by real rockets whose trajectories are not perfect conic sections. And logic is generally much more brittle than orbital mechanics, chaos theory notwithstanding; there isn’t generally anything that corresponds to the sum of squares of coefficients being small; a proof that contains only a few small errors is not a proof at all.)
But, hmm, you reckon one could make a viable proof-based agent that has a consistent set of axioms describing a potentially-inconsistent set of probabilities. That’s an intriguing idea but I’m having trouble seeing how it would work. E.g., suppose I’m searching for proofs that my expected utility if I do X is at least Y units; that expectation obviously involves a whole lot of probabilities, and my actual probability assignments are inconsistent in various ways. How do I prove anything about my expected utilities if the probabilities involved might be inconsistent?
This is all a bit off topic on this particular post; it’s not especially about your account of decision-theoretic counterfactuals as such, but about the whole project of understanding decision theory in terms of agents whose decision processes involve trying to prove things about their own behaviour.