I never found Stalnaker’s thesis at all plausible, not because I’d thought of the ingenious little calculation you give but because it just seems obviously wrong intuitively. But I suppose if you don’t have any presuppositions about what sort of notion an implication is allowed to be, you don’t get to reject it on those grounds. So I wasn’t really entitled to say “Pr(A|B) is not the same thing as Pr(B=>A) for any particular notion of implication”, since I hadn’t thought of that calculation.
Anyway, I have just the same sense of obvious wrongness about this counterfactual version of Stalnaker. I suspect it’s harder to come up with an outright refutation, not least because there isn’t anything like general agreement about what C(A|B) means, whereas there’s something much nearer to that for Pr(A|B).
At least some “nestings” of counterfactuals feel problematic to me. “Suppose it were true that if Bach had lived to be 90 then Mozart would have died at age 10; then if Dirichlet had lived to be 80, would Jacobi have died at 20?” The antecedent doesn’t do much to make clear just what is actually being supposed, and it’s not clear that this is made much better if we say instead “Suppose you believe, with credence 0.9, that if Bach had lived to be 90 then Mozart would have died at age 10; then how strongly do you believe that if Dirichlet had lived to be 80 then Jacobi would have died at 20?”. But I do think that a good analysis of counterfactuals should allow for questions of this form. (But, just as some conditional probabilities are 0⁄0 and some others are small/small and we shouldn’t trust our estimates much, some counterfactual probabilities are undefined or ill-conditioned. Whether or not they are actually literal ratios.)
Yeah, interesting. I don’t share your intuition that nested counterfactuals seem funny. The example you give doesn’t seem ill-defined due to the nesting of counterfactuals. Rather, the antecedent doesn’t seem very related to the consequent, which generally has a tendency to make counterfactuals ambiguous. If you ask “if calcium were always ionic, would Nixon have been elected president?” then I’m torn between three responses:
“No” because if we change chemistry, everything changes.
“Yes” because counterfactuals keep everything the same as much as possible, except what has to change; maybe we’re imagining a world where history is largely the same, but some specific biochemistry is different.
“I don’t know” because I am not sure what connection between the two you are trying to point at with the question, so, I don’t know how to answer.
In the case of your Bach example, I’m similarly torn. On the one hand, if we imagine some weird connection between the ages of Back and Mozart, we might have to change a lot of things. On the other hand, counterfactuals usually try to keep thing fixed if there’s not a reason to change them. So the intention of the question seems pretty unclear.
Which, in my mind, has little to do with the specific nested form of your question.
More importantly, perhaps, I think Stalnaker and other philosophers can be said to be investigating linguistic counterfactuals; their chief concern is formalizing the way humans naively talk about things, in a way which gives more clarity but doesn’t lose something important.
My chief concern is decision-theoretic counterfactuals, which are specifically being used to plan/act. This imposes different requirements.
The philosophy of linguistic counterfactuals is complex, of course, but personally I really feel that I understand fairly well what linguistic counterfactuals are and how they work. My picture probably requires a little exposition to be comprehensible, but to state it as simply as I can, I think linguistic counterfactuals can always be understood as “conditional probabilities, but using some reference frame rather than actual beliefs”. For example, very often we can understand counterfactuals as conditional probabilities from a past belief state. “If it had rained, we would not have come” can’t be understood as a conditional probability of the current beliefs where we knew we did come; but back up time a little bit, and it’s true that if it had been raining, we would not have made the trip.
Backing up time doesn’t always quite work. In those cases we can usually understand things in terms of a hypothetical “objective judge” who doesn’t know details of a situation but who knows things a “reasonable third party” would know. It makes sense that humans would have to consider this detached perspective a lot, in order to judge social situations; so it makes sense that we would have language for talking about it (IE counterfactual language).
We can make sense of nested linguistic counterfactuals in that way, too, if we wish. For example, “if driving had [counterfactually] meant not making it to the party, then we wouldn’t have done it” says (on my understanding) that if a reasonable third person would have looked at the situation and said that if we drive we won’t make it to the party, then, we would not have driven. (This in turn says that my past self would have not driven if he had believed that a resonable third person wouldn’t believe that we would make it to the party, given the information that we’re driving.)
So, I think linguistic counterfactuals implicitly require a description of a third party / past self to be evaluated; this is usually obvious enough from conversation, but, can be an ambiguity.
However, I don’t think this analysis helps with decision-theoretic counterfactuals. At least, not directly.
I agree that much of what’s problematic about the example I gave is that the “inner” counterfactuals are themselves unclear. I was thinking that this makes the nested counterfactual harder to make sense of (exactly because it’s unclear what connection there might be between them) but on reflection I think you’re right that this isn’t really about counterfactual nesting and that if we picked other poorly-defined (non-counterfactual) propositions we’d get a similar effect: “If it were morally wrong to eat shellfish, would humans Really Truly Have Free Will?” or whatever.
I’d not given any thought before to your distinction between linguistic and decision-theoretic counterfactuals. I’m actually not sure I understand the distinction. It’s obvious how ordinary conditionals are important for planning and acting (you design a bridge so that it won’t fall down if someone drives a heavy lorry across it; you don’t cross a bridge because you think the troll underneath will eat you if you cross), but counterfactuals? I mean, obviously you can put them in to a particular problem: you’re crossing a bridge and there’s a troll who’ll blow up the bridge if you would have crossed it if there’d been a warning sign saying “do not cross”, or whatever. But that’s not counterfactuals being useful for decision theory, it’s some agent arbitrarily caring about counterfactuals—and agents can arbitrarily care about anything. (I am not entirely sure I’ve understood the “Troll Bridge” example you’re actually using, but to whatever extent it’s about counterfactuals it seems to be of this “agent arbitrarily caring about counterfactuals” type.) The thing you call “proof-based decision theory” involves trying to prove things of the form “if I do X, I will get at least Y utility” but those look like ordinary conditionals rather than counterfactuals to me too. (And in any case the whole idea of doing what you can rigorously prove from a given set of mathematical axioms gives you the most guaranteed utility seems bonkers to me anyway as anything other than a toy example, though this is pure prejudice and maybe there are better reasons for it than I can currently imagine: we want agents that can act in the actual world, about which one can generally prove precisely nothing of interest.) Could you give a couple of examples where counterfactuals are relevant to planning and acting without having been artificially inserted?
It may just be that none of this should be expected to make sense to someone not already immersed in the particular proof-based-decision-theory framework I think you’re working in, and that what I need in order to appreciate where you’re coming from is to spend a few hours (days? weeks?) getting familiar with that. At any rate, right now “passing Troll Bridge” looks to me like a problem applicable only to a very specific kind of decision-making agent, one I don’t see any particular reason to think has any prospect of ever being relevant to decision-making in the actual world—but I am extremely aware that this may be purely a reflection of my own ignorance.
It’s obvious how ordinary conditionals are important for planning and acting (you design a bridge so that it won’t fall down if someone drives a heavy lorry across it; you don’t cross a bridge because you think the troll underneath will eat you if you cross), but counterfactuals? I mean, obviously you can put them in to a particular problem
All the various reasoning behind a decision could involve material conditionals, probabilistic conditionals, logical implication, linguistic conditionals (whatever those are), linguistic counterfactuals, decision-theoretic counterfactuals (if those are indeed different as I claim), etc etc etc. I’m not trying to make the broad claim that counterfactuals are somehow involved.
The claim is about the decision algorithm itself. The claim is that the way we choose an action is by evaluating a counterfactual (“what happens if I take this action?”). Or, to be a little more psychologically realistic, the cashed values which determine which actions we take are estimated counterfactual values.
What is the content of this claim?
A decision procedure is going to have (cashed-or-calculated) value estimates which it uses to make decisions. (At least, most decision procedures work that way.) So the content of the claim is about the nature of these values.
If the values act like Bayesian conditional expectations, then the claim that we need counterfactuals to make decisions is considered false. This is the claim of evidential decision theory (EDT).
If the values are still well-defined for known-false actions, then they’re counterfactual. So, a fundamental reason why MIRI-type decision theory uses counterfactuals is to deal with the case of known-false actions.
However, academic decision theorists have used (causal) counterfactuals for completely different reasons (IE because they supposedly give better answers). This is the claim of causal decision theory (CDT).
My claim in the post, of course, is that the estimated values used to make decisions should match the EDT expected values almost all of the time, but, should not be responsive to the same kinds of reasoning which the EDT values are responsive to, so should not actually be evidential.
Could you give a couple of examples where counterfactuals are relevant to planning and acting without having been artificially inserted?
It sounds like you’ve kept a really strong assumption of EDT in your head; so strong that you couldn’t even imagine why non-evidential reasoning might be part of an agent’s decision procedure. My example is the troll bridge: conditional reasoning (whether proof-based or expectation-based) ends up not crossing the bridge, where counterfactual reasoning can cross (if we get the counterfactuals right).
The thing you call “proof-based decision theory” involves trying to prove things of the form “if I do X, I will get at least Y utility” but those look like ordinary conditionals rather than counterfactuals to me too.
Right. In the post, I argue that using proofs like this is more like a form of EDT rather than CDT, so, I’m more comfortable calling this “conditional reasoning” (lumping it in with probabilistic conditionals).
The Troll Bridge is supposed to show a flaw in this kind of reasoning, suggesting that we need counterfactual reasoning instead (at least, if “counterfactual” is broadly understood to be anything other than conditional reasoning—a simplification which mostly makes sense in practice).
though this is pure prejudice and maybe there are better reasons for it than I can currently imagine: we want agents that can act in the actual world, about which one can generally prove precisely nothing of interest
Oh, yeah, proof-based agents can technically do anything which regular expectation-based agents can do. Just take the probabilistic model the expectation-based agents are using, and then have the proof-based agent take the action for which it can prove the highest expectation. This isn’t totally slight of hand; the proof-based agent will still display some interesting behavior if it is playing games with other proof-based agents, dealing with Omega, etc.
At any rate, right now “passing Troll Bridge” looks to me like a problem applicable only to a very specific kind of decision-making agent, one I don’t see any particular reason to think has any prospect of ever being relevant to decision-making in the actual world—but I am extremely aware that this may be purely a reflection of my own ignorance.
Even if proof-based decision theory didn’t generalize to handle uncertain reasoning, the troll bridge would also apply to expectation-based reasoners if their expectations respect logic. So the narrow class of agents for whome it makes sense to ask “does this agent pass the troll bridge” are basically agents who use logic at all, not just agents who are ristricted to pure logic with no probabilistic belief.
OK, I get it. (Or at least I think I do.) And, duh, indeed it turns out (as you were too polite to say in so many words) that I was distinctly confused.
So: Using ordinary conditionals in planning your actions commits you to reasoning like “If (here in the actual world it turns out that) I choose to smoke this cigarette, then that makes it more likely that I have the weird genetic anomaly that causes both desire-to-smoke and lung cancer, so I’m more likely to die prematurely and horribly of lung cancer, so I shouldn’t smoke it”, which makes wrong decisions. So you want to use some sort of conditional that doesn’t work that way and rather says something more like “suppose everything about the world up to now is exactly as it is in the actual world, but magically-but-without-the-existence-of-magic-having-consequences I decide to do X; what then?”. And this is what you’re calling decision-theoretic counterfactuals, and the question is exactly what they should be; EDT says no, just use ordinary conditionals, CDT says pretty much what I just said, etc. The “smoking lesion” shows that EDT can give implausible results; “Death in Damascus” shows that CDT can give implausible results; etc.
All of which I really should have remembered, since it’s all stuff I have known in the past, but I am a doofus. My apologies.
(But my error wasn’t being too mired in EDT, or at least I don’t think it was; I think EDT is wrong. My error was having the term “counterfactual” too strongly tied in my head to what you call linguistic counterfactuals. Plus not thinking clearly about any of the actual decision theory.)
It still feels to me as if your proof-based agents are unrealistically narrow. Sure, they can incorporate whatever beliefs they have about the real world as axioms for their proofs—but only if those axioms end up being consistent, which means having perfectly consistent beliefs. The beliefs may of course be probabilistic, but then that means that all those beliefs have to have perfectly consistent probabilities assigned to them. Do you really think it’s plausible that an agent capable of doing real things in the real world can have perfectly consistent beliefs in this fashion? (I am pretty sure, for instance, that no human being has perfectly consistent beliefs; if any of us tried to do what your proof-based agents are doing, we would arrive at a contradiction—or fail to do so only because we weren’t trying hard enough.) I think “agents that use logic at all on the basis of beliefs about the world that are perfectly internally consistent” is a much narrower class than “agents that use logic at all”.
(That probably sounds like a criticism, but once again I am extremely aware that it may be that this feels implausible to me only because I am lacking important context, or confused about important things. After all, that was the case last time around. So my question is more “help me resolve my confusion” than “let me point out to you how the stuff you’ve been studying for ages is wrongheaded”, and I appreciate that you may have other more valuable things to do with your time than help to resolve my confusion :-).)
All of which I really should have remembered, since it’s all stuff I have known in the past, but I am a doofus. My apologies.
(But my error wasn’t being too mired in EDT, or at least I don’t think it was; I think EDT is wrong. My error was having the term “counterfactual” too strongly tied in my head to what you call linguistic counterfactuals. Plus not thinking clearly about any of the actual decision theory.)
I’m glad I pointed out the difference between linguistic and DT counterfactuals, then!
It still feels to me as if your proof-based agents are unrealistically narrow. Sure, they can incorporate whatever beliefs they have about the real world as axioms for their proofs—but only if those axioms end up being consistent, which means having perfectly consistent beliefs. The beliefs may of course be probabilistic, but then that means that all those beliefs have to have perfectly consistent probabilities assigned to them. Do you really think it’s plausible that an agent capable of doing real things in the real world can have perfectly consistent beliefs in this fashion?
I’m not at all suggesting that we use proof-based DT in this way. It’s just a model. I claim that it’s a pretty good model—that we can often carry over results to other, more complex, decision theories.
However, if we wanted to, then yes, I think we could… I agree that if we add beliefs as axioms, the axioms have to be perfectly consistent. But if we use probabilistic beliefs, those probabilities don’t have to be perfectly consistent; just the axioms saying which probabilities we have. So, for example, I could use a proof-based agent to approximate a logical-induction-based agent, by looking for proofs about what the market expectations are. This would be kind of convoluted, though.
I appreciate that it’s a model, but it seems—perhaps wrongly, since as already mentioned I am an ignorant doofus—as if at least some of what you’re doing with the model depends essentially on the strictly-logic-based nature of the agent. (E.g., the Troll Bridge problem as stated here seems that way, down to applying Löb’s theorem to the agent as formal system.)
Formal logic is very brittle; ex falso quodlibet, and all that; it (ignorantly and doofusily) looks to me as if you might be looking at a certain class of models and then finding problems (e.g., Troll Bridge) that are only problems because of specific features of the models that couldn’t realistically apply to the real world.
(In terms of the “rocket alignment problem” metaphor: suppose you start thinking about orbital mechanics, come up with exact-conic-section orbits as an interesting class of things to study, and prove some theorem that says that some class of things isn’t achievable for exact-conic-section orbits for a reason that comes down to something like dividing by the sum of squares of all the higher-order terms that are exactly zero for a perfect conic section orbit. That would be an interesting theorem, and it’s not hard to imagine how some less-rigid generalization of it might apply to real trajectories (“if the sum of squares of those coefficients is small then the trajectory is unstable and hard to get right” or something) -- but as it stands it doesn’t really tell you anything about real problems faced by real rockets whose trajectories are not perfect conic sections. And logic is generally much more brittle than orbital mechanics, chaos theory notwithstanding; there isn’t generally anything that corresponds to the sum of squares of coefficients being small; a proof that contains only a few small errors is not a proof at all.)
But, hmm, you reckon one could make a viable proof-based agent that has a consistent set of axioms describing a potentially-inconsistent set of probabilities. That’s an intriguing idea but I’m having trouble seeing how it would work. E.g., suppose I’m searching for proofs that my expected utility if I do X is at least Y units; that expectation obviously involves a whole lot of probabilities, and my actual probability assignments are inconsistent in various ways. How do I prove anything about my expected utilities if the probabilities involved might be inconsistent?
This is all a bit off topic on this particular post; it’s not especially about your account of decision-theoretic counterfactuals as such, but about the whole project of understanding decision theory in terms of agents whose decision processes involve trying to prove things about their own behaviour.
I never found Stalnaker’s thesis at all plausible, not because I’d thought of the ingenious little calculation you give but because it just seems obviously wrong intuitively. But I suppose if you don’t have any presuppositions about what sort of notion an implication is allowed to be, you don’t get to reject it on those grounds. So I wasn’t really entitled to say “Pr(A|B) is not the same thing as Pr(B=>A) for any particular notion of implication”, since I hadn’t thought of that calculation.
Anyway, I have just the same sense of obvious wrongness about this counterfactual version of Stalnaker. I suspect it’s harder to come up with an outright refutation, not least because there isn’t anything like general agreement about what C(A|B) means, whereas there’s something much nearer to that for Pr(A|B).
At least some “nestings” of counterfactuals feel problematic to me. “Suppose it were true that if Bach had lived to be 90 then Mozart would have died at age 10; then if Dirichlet had lived to be 80, would Jacobi have died at 20?” The antecedent doesn’t do much to make clear just what is actually being supposed, and it’s not clear that this is made much better if we say instead “Suppose you believe, with credence 0.9, that if Bach had lived to be 90 then Mozart would have died at age 10; then how strongly do you believe that if Dirichlet had lived to be 80 then Jacobi would have died at 20?”. But I do think that a good analysis of counterfactuals should allow for questions of this form. (But, just as some conditional probabilities are 0⁄0 and some others are small/small and we shouldn’t trust our estimates much, some counterfactual probabilities are undefined or ill-conditioned. Whether or not they are actually literal ratios.)
Yeah, interesting. I don’t share your intuition that nested counterfactuals seem funny. The example you give doesn’t seem ill-defined due to the nesting of counterfactuals. Rather, the antecedent doesn’t seem very related to the consequent, which generally has a tendency to make counterfactuals ambiguous. If you ask “if calcium were always ionic, would Nixon have been elected president?” then I’m torn between three responses:
“No” because if we change chemistry, everything changes.
“Yes” because counterfactuals keep everything the same as much as possible, except what has to change; maybe we’re imagining a world where history is largely the same, but some specific biochemistry is different.
“I don’t know” because I am not sure what connection between the two you are trying to point at with the question, so, I don’t know how to answer.
In the case of your Bach example, I’m similarly torn. On the one hand, if we imagine some weird connection between the ages of Back and Mozart, we might have to change a lot of things. On the other hand, counterfactuals usually try to keep thing fixed if there’s not a reason to change them. So the intention of the question seems pretty unclear.
Which, in my mind, has little to do with the specific nested form of your question.
More importantly, perhaps, I think Stalnaker and other philosophers can be said to be investigating linguistic counterfactuals; their chief concern is formalizing the way humans naively talk about things, in a way which gives more clarity but doesn’t lose something important.
My chief concern is decision-theoretic counterfactuals, which are specifically being used to plan/act. This imposes different requirements.
The philosophy of linguistic counterfactuals is complex, of course, but personally I really feel that I understand fairly well what linguistic counterfactuals are and how they work. My picture probably requires a little exposition to be comprehensible, but to state it as simply as I can, I think linguistic counterfactuals can always be understood as “conditional probabilities, but using some reference frame rather than actual beliefs”. For example, very often we can understand counterfactuals as conditional probabilities from a past belief state. “If it had rained, we would not have come” can’t be understood as a conditional probability of the current beliefs where we knew we did come; but back up time a little bit, and it’s true that if it had been raining, we would not have made the trip.
Backing up time doesn’t always quite work. In those cases we can usually understand things in terms of a hypothetical “objective judge” who doesn’t know details of a situation but who knows things a “reasonable third party” would know. It makes sense that humans would have to consider this detached perspective a lot, in order to judge social situations; so it makes sense that we would have language for talking about it (IE counterfactual language).
We can make sense of nested linguistic counterfactuals in that way, too, if we wish. For example, “if driving had [counterfactually] meant not making it to the party, then we wouldn’t have done it” says (on my understanding) that if a reasonable third person would have looked at the situation and said that if we drive we won’t make it to the party, then, we would not have driven. (This in turn says that my past self would have not driven if he had believed that a resonable third person wouldn’t believe that we would make it to the party, given the information that we’re driving.)
So, I think linguistic counterfactuals implicitly require a description of a third party / past self to be evaluated; this is usually obvious enough from conversation, but, can be an ambiguity.
However, I don’t think this analysis helps with decision-theoretic counterfactuals. At least, not directly.
I agree that much of what’s problematic about the example I gave is that the “inner” counterfactuals are themselves unclear. I was thinking that this makes the nested counterfactual harder to make sense of (exactly because it’s unclear what connection there might be between them) but on reflection I think you’re right that this isn’t really about counterfactual nesting and that if we picked other poorly-defined (non-counterfactual) propositions we’d get a similar effect: “If it were morally wrong to eat shellfish, would humans Really Truly Have Free Will?” or whatever.
I’d not given any thought before to your distinction between linguistic and decision-theoretic counterfactuals. I’m actually not sure I understand the distinction. It’s obvious how ordinary conditionals are important for planning and acting (you design a bridge so that it won’t fall down if someone drives a heavy lorry across it; you don’t cross a bridge because you think the troll underneath will eat you if you cross), but counterfactuals? I mean, obviously you can put them in to a particular problem: you’re crossing a bridge and there’s a troll who’ll blow up the bridge if you would have crossed it if there’d been a warning sign saying “do not cross”, or whatever. But that’s not counterfactuals being useful for decision theory, it’s some agent arbitrarily caring about counterfactuals—and agents can arbitrarily care about anything. (I am not entirely sure I’ve understood the “Troll Bridge” example you’re actually using, but to whatever extent it’s about counterfactuals it seems to be of this “agent arbitrarily caring about counterfactuals” type.) The thing you call “proof-based decision theory” involves trying to prove things of the form “if I do X, I will get at least Y utility” but those look like ordinary conditionals rather than counterfactuals to me too. (And in any case the whole idea of doing what you can rigorously prove from a given set of mathematical axioms gives you the most guaranteed utility seems bonkers to me anyway as anything other than a toy example, though this is pure prejudice and maybe there are better reasons for it than I can currently imagine: we want agents that can act in the actual world, about which one can generally prove precisely nothing of interest.) Could you give a couple of examples where counterfactuals are relevant to planning and acting without having been artificially inserted?
It may just be that none of this should be expected to make sense to someone not already immersed in the particular proof-based-decision-theory framework I think you’re working in, and that what I need in order to appreciate where you’re coming from is to spend a few hours (days? weeks?) getting familiar with that. At any rate, right now “passing Troll Bridge” looks to me like a problem applicable only to a very specific kind of decision-making agent, one I don’t see any particular reason to think has any prospect of ever being relevant to decision-making in the actual world—but I am extremely aware that this may be purely a reflection of my own ignorance.
All the various reasoning behind a decision could involve material conditionals, probabilistic conditionals, logical implication, linguistic conditionals (whatever those are), linguistic counterfactuals, decision-theoretic counterfactuals (if those are indeed different as I claim), etc etc etc. I’m not trying to make the broad claim that counterfactuals are somehow involved.
The claim is about the decision algorithm itself. The claim is that the way we choose an action is by evaluating a counterfactual (“what happens if I take this action?”). Or, to be a little more psychologically realistic, the cashed values which determine which actions we take are estimated counterfactual values.
What is the content of this claim?
A decision procedure is going to have (cashed-or-calculated) value estimates which it uses to make decisions. (At least, most decision procedures work that way.) So the content of the claim is about the nature of these values.
If the values act like Bayesian conditional expectations, then the claim that we need counterfactuals to make decisions is considered false. This is the claim of evidential decision theory (EDT).
If the values are still well-defined for known-false actions, then they’re counterfactual. So, a fundamental reason why MIRI-type decision theory uses counterfactuals is to deal with the case of known-false actions.
However, academic decision theorists have used (causal) counterfactuals for completely different reasons (IE because they supposedly give better answers). This is the claim of causal decision theory (CDT).
My claim in the post, of course, is that the estimated values used to make decisions should match the EDT expected values almost all of the time, but, should not be responsive to the same kinds of reasoning which the EDT values are responsive to, so should not actually be evidential.
It sounds like you’ve kept a really strong assumption of EDT in your head; so strong that you couldn’t even imagine why non-evidential reasoning might be part of an agent’s decision procedure. My example is the troll bridge: conditional reasoning (whether proof-based or expectation-based) ends up not crossing the bridge, where counterfactual reasoning can cross (if we get the counterfactuals right).
Right. In the post, I argue that using proofs like this is more like a form of EDT rather than CDT, so, I’m more comfortable calling this “conditional reasoning” (lumping it in with probabilistic conditionals).
The Troll Bridge is supposed to show a flaw in this kind of reasoning, suggesting that we need counterfactual reasoning instead (at least, if “counterfactual” is broadly understood to be anything other than conditional reasoning—a simplification which mostly makes sense in practice).
Oh, yeah, proof-based agents can technically do anything which regular expectation-based agents can do. Just take the probabilistic model the expectation-based agents are using, and then have the proof-based agent take the action for which it can prove the highest expectation. This isn’t totally slight of hand; the proof-based agent will still display some interesting behavior if it is playing games with other proof-based agents, dealing with Omega, etc.
Even if proof-based decision theory didn’t generalize to handle uncertain reasoning, the troll bridge would also apply to expectation-based reasoners if their expectations respect logic. So the narrow class of agents for whome it makes sense to ask “does this agent pass the troll bridge” are basically agents who use logic at all, not just agents who are ristricted to pure logic with no probabilistic belief.
OK, I get it. (Or at least I think I do.) And, duh, indeed it turns out (as you were too polite to say in so many words) that I was distinctly confused.
So: Using ordinary conditionals in planning your actions commits you to reasoning like “If (here in the actual world it turns out that) I choose to smoke this cigarette, then that makes it more likely that I have the weird genetic anomaly that causes both desire-to-smoke and lung cancer, so I’m more likely to die prematurely and horribly of lung cancer, so I shouldn’t smoke it”, which makes wrong decisions. So you want to use some sort of conditional that doesn’t work that way and rather says something more like “suppose everything about the world up to now is exactly as it is in the actual world, but magically-but-without-the-existence-of-magic-having-consequences I decide to do X; what then?”. And this is what you’re calling decision-theoretic counterfactuals, and the question is exactly what they should be; EDT says no, just use ordinary conditionals, CDT says pretty much what I just said, etc. The “smoking lesion” shows that EDT can give implausible results; “Death in Damascus” shows that CDT can give implausible results; etc.
All of which I really should have remembered, since it’s all stuff I have known in the past, but I am a doofus. My apologies.
(But my error wasn’t being too mired in EDT, or at least I don’t think it was; I think EDT is wrong. My error was having the term “counterfactual” too strongly tied in my head to what you call linguistic counterfactuals. Plus not thinking clearly about any of the actual decision theory.)
It still feels to me as if your proof-based agents are unrealistically narrow. Sure, they can incorporate whatever beliefs they have about the real world as axioms for their proofs—but only if those axioms end up being consistent, which means having perfectly consistent beliefs. The beliefs may of course be probabilistic, but then that means that all those beliefs have to have perfectly consistent probabilities assigned to them. Do you really think it’s plausible that an agent capable of doing real things in the real world can have perfectly consistent beliefs in this fashion? (I am pretty sure, for instance, that no human being has perfectly consistent beliefs; if any of us tried to do what your proof-based agents are doing, we would arrive at a contradiction—or fail to do so only because we weren’t trying hard enough.) I think “agents that use logic at all on the basis of beliefs about the world that are perfectly internally consistent” is a much narrower class than “agents that use logic at all”.
(That probably sounds like a criticism, but once again I am extremely aware that it may be that this feels implausible to me only because I am lacking important context, or confused about important things. After all, that was the case last time around. So my question is more “help me resolve my confusion” than “let me point out to you how the stuff you’ve been studying for ages is wrongheaded”, and I appreciate that you may have other more valuable things to do with your time than help to resolve my confusion :-).)
I’m glad I pointed out the difference between linguistic and DT counterfactuals, then!
I’m not at all suggesting that we use proof-based DT in this way. It’s just a model. I claim that it’s a pretty good model—that we can often carry over results to other, more complex, decision theories.
However, if we wanted to, then yes, I think we could… I agree that if we add beliefs as axioms, the axioms have to be perfectly consistent. But if we use probabilistic beliefs, those probabilities don’t have to be perfectly consistent; just the axioms saying which probabilities we have. So, for example, I could use a proof-based agent to approximate a logical-induction-based agent, by looking for proofs about what the market expectations are. This would be kind of convoluted, though.
I appreciate that it’s a model, but it seems—perhaps wrongly, since as already mentioned I am an ignorant doofus—as if at least some of what you’re doing with the model depends essentially on the strictly-logic-based nature of the agent. (E.g., the Troll Bridge problem as stated here seems that way, down to applying Löb’s theorem to the agent as formal system.)
Formal logic is very brittle; ex falso quodlibet, and all that; it (ignorantly and doofusily) looks to me as if you might be looking at a certain class of models and then finding problems (e.g., Troll Bridge) that are only problems because of specific features of the models that couldn’t realistically apply to the real world.
(In terms of the “rocket alignment problem” metaphor: suppose you start thinking about orbital mechanics, come up with exact-conic-section orbits as an interesting class of things to study, and prove some theorem that says that some class of things isn’t achievable for exact-conic-section orbits for a reason that comes down to something like dividing by the sum of squares of all the higher-order terms that are exactly zero for a perfect conic section orbit. That would be an interesting theorem, and it’s not hard to imagine how some less-rigid generalization of it might apply to real trajectories (“if the sum of squares of those coefficients is small then the trajectory is unstable and hard to get right” or something) -- but as it stands it doesn’t really tell you anything about real problems faced by real rockets whose trajectories are not perfect conic sections. And logic is generally much more brittle than orbital mechanics, chaos theory notwithstanding; there isn’t generally anything that corresponds to the sum of squares of coefficients being small; a proof that contains only a few small errors is not a proof at all.)
But, hmm, you reckon one could make a viable proof-based agent that has a consistent set of axioms describing a potentially-inconsistent set of probabilities. That’s an intriguing idea but I’m having trouble seeing how it would work. E.g., suppose I’m searching for proofs that my expected utility if I do X is at least Y units; that expectation obviously involves a whole lot of probabilities, and my actual probability assignments are inconsistent in various ways. How do I prove anything about my expected utilities if the probabilities involved might be inconsistent?
This is all a bit off topic on this particular post; it’s not especially about your account of decision-theoretic counterfactuals as such, but about the whole project of understanding decision theory in terms of agents whose decision processes involve trying to prove things about their own behaviour.