$1000 USD prize—Circular Dependency of Counterfactuals
Congrats to the winner TailCalled with their post Some thoughts on “The Nature of Counterfactuals”. See the winner announcement post.
I’ve previously argued that the concept of counterfactuals can only be understood from within the counterfactual perspective.
I will be awarding a $1000 prize for the best post that engages with the idea that counterfactuals may be circular in this sense. The winning entry may be one of the following (these categories aren’t intended to be exclusive):
a) A post that attempts to draw out the consequences of this principle for decision theory
b) A post that attempts to evaluate the arguments for and against adopting the principle that counterfactuals only make sense from within the counterfactual perspective
c) A review of relevant literature in philosophy or decision theory
d) A post that restates already existing ideas in a clearer or more accessible manner (I don’t think this topic has been explored much on LW, but it may have in explored in the literature on decision theory or philosophy)
Feel free to ask me for clarification about what would be on or off-topic. Probably the main thing I’d like to see is substantial engagement with this principle. The bounty is for posts that engage with the notion that counterfactuals might only make sense from within a counterfactual perspective. I have written on this topic, but the competition isn’t limited to posts that engage with my views on this topic. It’s perfectly fine to engage with other arguments for this proposition if, for example, you find someone arguing in favour of this in the philosophical/mathematical literature or Less Wrong.
If someone submits a high-quality post that only touches on this issue tangentially, but someone else submits an only okayish post that tries to deeply engage with this issue, then I would likely award it to the latter as I’m trying to incentivise more engagement with this issue rather than just high-quality posts generally. If the bounty is awarded to an unexpected submission, I expect this to be the main cause.
I will be awarding an additional $100 for the best short-form post on this topic. This may be a LW Shortform post, a public Facebook post, a Twitter thread, ect (I’m not going to include Discord/Slack messages as they aren’t accessible).
Why do I believe in this principle?
Roughly, my reasons are as follows:
Rejecting David Lewis’ Counterfactual Realism as absurd and therefore concluding that counterfactuals must be at least partially a human construction: either a) in the sense of them being an inevitable and essential part of how we make sense of the world by our very nature or b) in the sense of being a semi-arbitrary and contingent system that we’ve adopted in order to navigate the world
Insofar as counterfactuals are inherently a part of how we interpret the world, the only way that we can understand them is to “look out through them”, notice what we see, and attempt to characterise this as precisely as possible
Insofar as counterfactuals are a somewhat arbitrary and contingent system constructed in order to navigate the world, the way that the system is justified is by imagining adopting various mental frameworks and noticing that a particular framework seems like it would be useful over a wide variety of circumstances. However, we’ve just invoked counterfactuals twice: a) by imagining adopting different mental frameworks b) by imagining different circumstances over which to evaluate these frameworks[1].
In either case, we seem to be unable to characterise counterfactuals without depending on already having the concept of counterfactuals. Or at least, I find this argument persuasive.
Why do I believe this is important?
I’ve argued for the importance of agent meta-foundations before. Roughly, there seems to be a lot of confusion about what counterfactuals are and how to construct them. I believe that much of this confusion would be cleared up if we can sort out some of these foundational issues. And the claim that counterfactuals can only be understood from an interior perspective is one such issue.
Why am I posting this bounty?
I believe in this idea, but:
I haven’t been able to dedicate nearly as much to time exploring this as I would like in between all of my other commitments
Working on this approach just by myself is kind of lonely and extremely challenging (for example, it’s hard to get good quality feedback)
I suspect that more people would be persuaded that this was a fruitful approach if this principle was presented to them in a different light.
How do I submit my entry?
Make a post on LW or the Alignment forum, then add a link in the comments below. I guess I’m also open to private submissions. Ideally, you should mention that you’re submitting your post for the bounty just to make sure that I’m aware of it.
When do I need to submit by?
I’m currently planning to set the submission window to 3 months from the date of this post (that would be the 1st of April, but let’s make it April 2nd so people don’t think this competition is some kind of prank). Submissions after this date may be refused.
How will this be judged?
I’ve written on this topic myself, so this probably biases me in some ways, but $1000 is a small enough amount of money that it’s probably not worthwhile looking for external judges.
Some Background Info
I guess I started to believe that counterfactuals were circular when I started to ask questions like, “What actually are these things we call counterfactuals?”. I noticed that they didn’t seem to exist in a literal sense, but that we also seem to be unable to do without them.
Some people have asked why the Bayesian Network approach suggested by Judea Pearl is insufficient (including in the comments below). This approach is firmly rooted in Causal Decision Theory (CDT). Most people on LW have rejected CDT because of its failure to handle Newcomb’s Problem.
MIRI has proposed Functional Decision Theory (FDT) as an alternative, but this theory is dependent on logical counterfactuals and they haven’t figured out exactly how to construct these. While I don’t exactly agree with the logical counterfactual framing, I agree that these kinds of exotic decision theory problems require us to create a new notion of counterfactuals. And this naturally leads to questions about what counterfactuals really are which I see as further leading to the conclusion that they are circular.
I can see why many people are sufficiently skeptical of the notion of counterfactuals being circular that they dismiss it out of hand. It’s entirely possible that I could be mistaken about this thesis, but for these people, I’d suggest reading Eliezer’s post Where Recursive Justification Hits Bottom which argues for a circular epistemology since if you are persuaded by this post, counterfactuals being circular may then be less of a jump.
Fine Print
I’ll award the prize assuming that there’s at least one semi-decent submission (according to the standards of posts on Less Wrong). If this isn’t the case, then I’ll donate the money to an AI Safety organization instead. I’d be open to having this money be held in escrow.
I’m intending to award the prize to the top entry, but there’s a chance that I split it if I can’t make a decision.
- ^
Counterpoint: requiring counterfactuals to justify their own use isn’t the same as counterfactuals only making sense from within themselves. Response: It’s possible to engage in the appropriate symbol manipulation without a concept of counterfactuals, but we can’t have a semantic understanding of what we’re doing. We can’t even describe this process without being to say things like “if given string of symbols s, do y”. Similarly, counterfactuals aren’t just justified by imagining the consequences of applying different mental over different circumstances, in this case, they are a system for performing well over a variety of circumstances.
- Some thoughts on “The Nature of Counterfactuals” by Jan 16, 2022, 6:12 PM; 20 points) (
- Results: Circular Dependency of Counterfactuals Prize by Apr 5, 2022, 6:29 AM; 19 points) (
- What is a Counterfactual: An Elementary Introduction to the Causal Hierarchy by Jan 2, 2022, 3:46 AM; 11 points) (
- Jan 9, 2023, 5:42 AM; 4 points) 's comment on You’re Not One “You”—How Decision Theories Are Talking Past Each Other by (
- Circular Counterfactuals “Only that which Happens is Possible” by Mar 23, 2022, 2:40 PM; 4 points) (
- Jan 5, 2022, 7:41 AM; 4 points) 's comment on More Is Different for AI by (
- Apr 2, 2022, 9:09 AM; 2 points) 's comment on Chris_Leong’s Shortform by (
I previously wrote a post about reconciling free will with determinism. The metaphysics implicit in Pearlian causality is free will (In Drescher’s words: “Pearl’s formalism models free will rather than mechanical choice.”). The challenge is reconciling this metaphysics with the belief that one is physically embodied. That is what the post attempts to do; these perspectives aren’t inherently irreconcilable, we just have to be really careful about e.g. distinguishing “my action” vs “the action of the computer embodying me” in a the Bayes net and distinguishing the interventions on them.
I wrote another post about two alternatives to logical counterfactuals: one says counterfactuals don’t exist, one says that your choice of policy should affect your anticipation of your own source code. (I notice you already commented on this post, just noting it for completeness)
And a third post, similar to the first, reconciling free will with determinism using linear logic.
I’m interested in what you think of these posts and what feels unclear/unresolved, I might write a new explanation of the theoretical perspective or improve/extend/modify it in response.
You’ve linked me to three different posts, so I’ll address them in separate comments.
Two Alternatives to Logical Counterfactuals
I actually really liked this post—enough that I changed my original upvote to a strong upvote. I also disagree with the notion that logical counterfactuals make sense when taken literally so I really appreciated you making this point persuasively. I agreed with your criticisms of the material condition approach and I think policy-dependent source code could be potentially promising. I guess this naturally leads to the question of how to justify this approach. This results in questions like, “What exactly is a counterfactual?” and “Why exactly do we want such a notion?” and I believe that following this path leads to the discovery that counterfactuals are circular.
I’m more open to saying that I adopt Counterfactual Non-Realism than I was when I originally commented although I don’t see theories based on material conditionals as the only approach within this category. I guess I’m also more enthusiastic about thinking in terms of policies rather than action mainly because of the lesson I drew from the Counterfactual Prisoner’s Dilemma. I don’t really know why I didn’t make this connection at the time, since I had written that post a few months prior, but I appear to have missed this.
I still feel that introducing the term “free will” is too loaded to be helpful here, regardless of whether you are or aren’t using it in a non-standard fashion. Like I’d encourage you to structure your posts to try to separate:
a) This is how we handle counterfactuals
b) This is the implications of this for the free will debate
A large part of this is because I suspect many people on Less Wrong are simply allergic to this term.
Thoughts on Modeling Naturalized Logic Decision Theory Problems in Linear Logic
I hadn’t heard of linear logic before—it seems like a cool formalisation—although I tend to believe that formalisations are overrated as unless they are used very carefully they can obscure more than they reveal.
I believe that spurious counterfactuals are only an issue with the 5 and 10 problem because of an attempt to hack logical-if to substitute for counterfactual-if in such a way that we can reuse proof-based systems. It’s extremely cool that we can do as much as we can working in that fashion, but there’s no reason why we should be surprised that it runs into limits.
So I don’t see inventing alternative formalisations that avoid the 5 and 10 problem as particularly hard as the bug is really quite specific to systems that try to utilise this kind of hack. I’d expect that almost any other system in design space will avoid this. So if, as I claim, attempts at formalisation will avoid this issue by default, the fact that any one formalisation avoids this problem shouldn’t give us too much confidence in it being a good system for representing counterfactuals in general.
Instead, I think it’s much more persuasive to ground any proposed system with philosophical arguments (such as your first post was focusing on), rather than mostly just posting a system and observing it has a few nice properties. I mean, your approach in this article certainly a valuable thing to do, but I don’t see it as getting all the way to the heart of the issue.
Interestingly enough, this mirrors my position in Why 1-boxing doesn’t imply backwards causation where I distinguish between Raw Reality (the territory) and Augmented Reality (the territory augmented by counterfactuals). I guess I put more emphasis on delving into the philosophical reasons for such a view and I think that’s what this post is a bit short on.
Thanks for reading all the posts!
I’m not sure where you got the idea that this was to solve the spurious counterfactuals problem, that was in the appendix because I anticipated that a MIRI-adjacent person would want to know how it solves that problem.
The core problem it’s solving is that it’s a well-defined mathematical framework in which (a) there are, in some sense, choices, and (b) it is believed that these choices correspond to the results of a particular Turing machine. It goes back to the free will vs determinism paradox, and shows that there’s a formalism that has some properties of “free will” and some properties of “determinism”.
A way that EDT fails to solve 5 and 10 is that it could believe with 100% certainty that it takes $5 so its expected value for $10 is undefined. (I wrote previously about a modification of EDT to avoid this problem.)
CDT solves it by constructing physically impossible counterfactuals which has other problems, e.g. suppose there’s a Laplace’s demon that searches for violations of physics and destroys the universe if physics is violated; this theoretically shouldn’t make a difference but it messes up the CDT counterfactuals.
It does look like your post overall agrees with the view I presented. I would tend to call augmented reality “metaphysics” in that it is a piece of ontology that goes beyond physics. I wrote about metaphysical free will a while ago and didn’t post it on LW because I anticipated people would be allergic to the non-physicalist philosophical language.
Thanks for that clarification.
I suppose that demonstrates that the 5 and 10 problem is a broader problem than I realised. I still think that it’s only a hard problem within particular systems that have a vulnerability to it.
Yeah, we have significant agreement, but I’m more conservative in my interpretations. I guess this is a result of me being, at least in my opinion, more skeptical of language. Like I’m very conscious of arguments where someone says, “X could be described by phrase Y” and then later they rely on connations of Y that weren’t proven.
For example, you write, “From the AI’s perspective, it has a choice among multiple actions, hence in a sense “believing in metaphysical free will”. I would suggest it would be more accurate to write: “The AI models the situation as though it had free will” which leaves open the possibility that it is might be just a pragmatic model, rather than the AI necessarily endorsing itself as possessing free will.
Another way of framing this: there’s an additional step in between observing that an agent acts or models a situation as it believes in freewill and concluding that it actually believes in freewill. For example, I might round all numbers in a calculation to integers in order to make it easier for me, but that doesn’t mean that I believe that the values are integers.
Comments on A critical agential account of free will, causation, and physics
We can imagine a situation where there is a box containing an apple or a pear. Suppose we believe that it contains a pear, but we believe it contains an apple. If we look in the box (and we have good reason to believe looking doesn’t change the contents), then we’ll falsfy our pear hypothesis. Similarly, if we’re told by an oracle that if we looked we would see a pear, then there’d be no need for us to actually look, we’d have heard enough to falsify our pear hypothesis.
However, the situation you’ve identified isn’t the same. Here you aren’t just deciding whether to make an observation or not, but what the value of that observation would be. So in this case, the fact that if you took action B you’d observe the action you took was B doesn’t say anything about the case where you don’t take action B, unlike knowing that if you looked in the box you’d see you an apple provides you information even if you don’t look in the box. It simply isn’t relevant unless you actually take B.
I think it’s reasonable to suggest starting from falsification as our most basic assumption. I guess where you lose me is when you claim that this implies agency. I guess my position is as follows:
It seems like agents in a deterministic universe can falsify theories in at least some sense. Like they take two different weights drop them and see they land at the same time falsifying the fact that heavier objects fall faster
On the other hand, some like agency or counterfactuals seems necessary for talking about falsfiability in the abstract as this involves saying that we could falsify a theory if we ran an experiment that we didn’t.
In the second case, I would suggest that what we need is counterfactuals not agency. That is, we need to be able to say things like, “If I ran this experiment and obtained this result, then theory X would be falsified”, not “I could have run this experiment and if I did and we obtained this result, then theory X would be falsified”.
In other words, I think that there is something behind the intuition which I’m guessing led you to these views, but am in favour of developing it in a different direction than you.
I didn’t read past this point, not because I thought it was uninteresting, but because it already took me a while to figure out how to articulate my objections to the article up to this point and I still have to look at one of your posts. But let me know if there’s anything further down more directly related to whether counterfactuals are circular.
The main problem is that it isn’t meaningful for their theories to make counterfactual predictions about a single situation; they can create multiple situations (across time and space) and assume symmetry and get falsification that way, but it requires extra assumptions. Basically you can’t say different theories really disagree unless there’s some possible world / counterfactual / whatever in which they disagree; finding a “crux” experiment between two theories (e.g. if one theory says all swans are white and another says there are black swans in a specific lake, the cruxy experiment looks in that lake) involves making choices to optimize disagreement.
Those seem pretty much equivalent? Maybe by agency you mean utility function optimization, which I didn’t mean to imply was required.
The part I thought was relevant was the part where you can believe yourself to have multiple options and yet be implemented by a specific computer.
Agreed, this is yet another argument for considering counterfactuals to be so fundamental that they don’t make sense outside of themselves. I just don’t see this as incompatible with determinism, b/c I’m grounding using counterfactuals rather than agency.
I don’t mean utility function optimization, so let me clarify what as I see as the distinction. I guess I see my version as compatible with the determinist claim that you couldn’t have run the experiment because the path of the universe was always determined from the start. I’m referring to a purely hypothetical running with no reference to whether you could or couldn’t have actually run it.
Hopefully, my comments here have made it clear where we diverge and this provides a target if you want to make a submission (that said, the contest is about the potential circular dependency of counterfactuals and not just my views. So it’s perfectly valid for people to focus on other arguments for this hypothesis, rather than my specific arguments).
I mostly agree with Zack_M_Davis that this is a solved problem, although rather than talking about a formalization of causality I’d say this is a special case of epistemic circularity and thus an instance of the problem of the criterion. There’s nothing unusual going on with counterfactuals other than that people sometimes get confused about what propositions are (e.g. they believe propositions have some sort of absolute truth beyond causality because they fail to realize epistemology is grounded in purpose rather than something eternal and external to the physical world) and then go on to get mixed up into thinking that something special must be going on with counterfactuals due to their confusion about propositions in general.
I don’t know if I’ll personally get around to explaining this in more detail, but I think this is low hanging fruit since it falls out so readily from understanding the contingency of epistemology caused by the problem of the criterion.
Which part are you claiming is a solved problem? Is it:
a) That counterfactuals can only be understood within the counterfactual perspective OR
b) The implications of this for decision theory OR
c) Both
I think A is solved, though I wouldn’t exactly phrase it like that, more like counterfactuals make sense because they are what they are and knowledge works the way it does.
Zack seems to be making a claim to B, but I’m not expert enough in decision theory to say much about it.
Sorry, when you say A is solved, you’re claiming that the circularity is known to be true, right?
Zack seems to be claiming that Bayesian Networks both draw out the implications and show that the circularity is false.
So unless I’m misunderstanding you, your answer seems to be at odds with Zack.
I don’t think they’re really at odds. Zack’s analysis cuts off at a point where the circularity exists below it. There’s still the standard epistemic circularity that exists whenever you try to ground out any proposition, counterfactual or not, but there’s a level of abstraction where you can remove the seeming circularity by shoving it lower or deeper into the reduction of the proposition towards grounding out in some experience.
Another way to put this is that we can choose what to be pragmatic about. Zack’s analysis choosing to be pragmatic about counterfactuals at the level of making decisions, and this allows removing the circularity up to the purpose of making a decision. If we want to be pragmatic about, say, accurately predicting what we will observe about the world, then there’s still some weird circularity in counterfactuals to be addressed if we try to ask questions like “why these counterfactuals rather than others?” or “why can we formulate counterfactuals at all?”.
Also I guess I should be clear that there’s no circularity outside the map. Circularity is entirely a feature of our models of reality rather than reality itself. That’s way, for example, the analysis on epistemic circularity I offer is that we can ground things out in purpose and thus the circularity was actually an illusion of trying to ground truth in itself rather than experience.
I’m not sure I’ve made this point very clearly elsewhere before, so sorry if that’s a bit confusing. The point is that circularity is a feature of the relative rather than the absolute, so circularity exists in the map but not the territory. We only get circularity by introducing abstractions that can allow things in the map to depend on each other rather than the territory.
I wouldn’t be surprised if other concepts such as probability were circular in the same way as counterfactuals, although I feel that this is more than just a special case of epistemic circularity. Like I agree that we can only reason starting from where we are—rather than from the view from nowhere—but counterfactuals feel different because they are such a fundamental concept that appears everywhere. As an example, our understanding of chairs doesn’t seem circular in quite the same sense. That said, I’d love to see someone explore this line of thought.
I could be wrong, but I suspect Zack would disagree with the notion that there is a circularity below it involving counterfactuals. I wouldn’t be surprised though if Zack acknowledge a circularity not involving counterfactuals.
Agreed. That said, I don’t think counterfactuals are in the territory. I think I said before that they were in the map, although I’m now leaning away from that characterisation as I feel that they are more of a fundamental category that we use to draw the map.
Yes, I think there is something interesting going on where human brains seem to operate in a way that makes counterfactuals natural. I actually don’t think there’s anything special about counterfactuals, though, just that the human brain is designed such that thoughts are not strongly tethered to sensory input vs. “memory” (internally generated experience), but that’s perhaps only subtly different than saying counterfactuals rather than something powering them is a fundamental feature of how our minds work.
I think I disagree here. I’m working on an entry to OP’s competition which will contain an argument showing some inherent convergence between different agent’s counterfactuals, due to the structure of the universe.
I think this is just agreement then? That minds are influenced by the structure of the universe they operate in in similar ways sounds like exactly what we should expect. That doesn’t mean we need to elevate such convergence to be something more than intersubjective agreement about reality.
If minds are influenced by the structure of the universe, then that requires some causal structure of the universe to influence them.
Causation is a feature of models, not reality. We need only suppose reality is one thing after another (or not even that! reality is just this moment, which for us contains a sensation we call a memory of past moments), and any causal structure is inferred to exist rather than something we directly observe. I make this argument in some detail here: https://www.lesswrong.com/posts/RMBMf85gGYytvYGBv/no-causation-without-reification
I feel a bit confused.
I agree that causal structure is inferred to exist, and never directly observable. However, the universe has certain properties that makes it very hard not to infer a causal structure if we want to model it, in particular:
A constant increase in entropy
Deterministic laws relating the past and the future
… which have symmetry across time and space
It seems exponentially hard to account for this without causality.
When opening the post:
I immediately disagree here, formally we usually model causality as our observations being generated by some sort of dynamical system. This cannot be specified with a mathematical notation like implication.
Sure, I know, but that doesn’t mean there’s no dynamical process generating the territory, only that we don’t know which one (and maybe can’t know).
A and B are typically high-level features in our models that simplify the territory; as a result, the causality in our models will also be simplifications of the causality in the territory.
But without causality, I don’t see how you’d get thermodynamics. That seems like a “just is” that is best accounted for causally, even if we don’t have the exact causal theory underlying it. (Somehow, thermodynamics has managed to hold even as we’ve repeatedly updated our models, because it doesn’t depend on the exact causal model, but instead follows from deep aspects of the causal structure of reality.)
But if causality is describing some feature of reality, and the feature it is describing is not itself causal, then what is the feature it is describing?
I’m still puzzled by your puzzlement.
You are treating httpss://www.greaterwrong.com/posts/T4Mef9ZkL4WftQBqw/the-nature-of-counterfactuals as though it still an open, but as far as I can see, all the issues raised were answered in the comments .
I think this is a solved problem. Are you familiar with the formalization of causality in terms of Bayesian networks? (You have enough history on this website that you’ve probably heard of it!)
Make observations using sensors. Abstract your sensory data into variables: maybe you have a
weather
variable with possible valuesRAINY
andSUNNY
, asprinkler
variable with possible valuesON
andOFF
, and asidewalk
variable with possible valuesWET
andDRY
. As you make more observations, you can begin to learn statistical relationships between your variables: maybeweather
andsprinkler
are independent, but conditionally dependent given the value ofsprinkler
. It turns out that you can summarize this kind of knowledge in the form of a directed graph: weather → sidewalk ← sprinkler. (I’m glossing over a lot of details: a graph represents conditional-independence relationships in the joint distribution over your variables, but the distribution doesn’t uniquely specify a graph.)But once you’ve learned this graphical model representing the probabilistic relationships between variables which represent abstractions over your sensory observations, then you can construct a similar model that fixes a particular variable to have a particular value, but keeps everything else the same.
Why would you do that? Because such an altered model is useful for decisionmaking if the system-that-you-are is one of the variables in the graph. The way you compute which decision to output is based on a model of how the things in your environment depend on your decision, and it’s possible to learn such a model from previous observations, even though you can’t observe the effects of your current decision in advance of making it.
And that’s what counterfactuals are! I don’t think this is meaningfully circular: we’ve described how the system works in terms of lower-level components. (I’ve omitted a lot of details, but we can totally write computer programs that do this stuff.)
I don’t really agree. The idea of using conditional independencies as measuring causality is cute in theory, but it doesn’t IME work in practice for many reasons. Both because things are rarely truly independent, because you don’t get enough data to test for independencies in practice, and because conditional independence relations are not enough to uniquely identify the causal structure. There’s much more to causality than just conditional independence relations.
Maybe I’m explaining it badly? I’m trying to point to the Judea Pearl thing in my own words. The claim is not that causality “just is” conditional independence relationships. (Pearl repeatedly explicitly disclaims that causal concepts are different from statistical concepts and require stronger assumptions.) Do you have an issue with the graph formalism itself (as an explanation of the underlying reality of how causality and counterfactuals work), separate from practical concerns about how one would learn a particular graph?
Partly it’s explaining it badly. In addition to the points listed above, there’s also issues like focusing entirely on rung 2 causality and disregarding rung 3 causality, which is arguably the truer kind of causality.
I assume that here we are understanding the graph formalism sufficiently broadly as to include e.g. differential equations, as otherwise there’s definitely a problem already there. And in the same vein, for most problems both DAGs and differential equations are too rigid/vector-spacey to work, and we probably need new formalisms that can better handle systems with varying structure of variables.
Regardless, I don’t think the question of how one would learn a particular graph is merely a practical concern; it’s the core part. Not just learning the edges between the vertices, but also in selecting the variables that are supposed to feature in the graphs. In fact I suspect once we have a good understanding of representation learning, we will see that causal structure learning follows mostly from the representations we choose, because the things that make certain function interesting as features tend to be the causal effects they have.
As far as I know, most of the focus of the causal inference literature is on effect size estimation. Which is probably important too, but it’s not really the hard part that OP is asking about. As far as I know, it only has slight focus on causal structure learning, and the typical advice seems to be to have human experts do the causal structure specification. And as far as I know, they don’t have an answer at all to representation learning. (Instead, John Wentworth seems to be the hero who is working on a solid theory for this.)
Yeah, I’m aware of Bayesian Networks.
Two points:
Bayesian Networks don’t solve Newcomb’s problem, but I assume you’re aware of it. So I’m guessing your point is that if standard counterfactuals can be constructed outside of the counterfactual perspective that more general counterfactuals would most likely be the same?
Does the concept of a variable even make sense without counterfactuals? It’s not immediately obvious that it does, although I haven’t thought through this enough to assert that it doesn’t.
Update: Having spent a few minutes thinking this through, I’ve concluded that the concept of a variable over time makes sense or a variable over space, ect. makes sense without counterfactuals. However, this is a more limited notion of variable than that which we normally deal with as, if for example, the variable L representing the state of a lightswitch is “ON” at t=0, then we wouldn’t have the notion that it could have been “OFF” instead.
Update 2: Upon further thought, this seems more limited than I first thought. For example, we can’t say let a be how many apples there would be at time t if we counted them, because “if we counted them” is invoking counterfactual reasoning, unless we really did count the apples at each time period. In any case, the issue of whether or not Bayesian Networks are circular seems to be complex enough that it is deserving of further investigation.
Counterfactuals (in the potential outcome sense used in statistics) and Pearl’s structural equation causality semantics are equivalent.
What are your thoughts on Newcomb’s, ect?
I gave a talk at FHI ages ago on how to use causal graphs to solve Newcomb type problems. It wasn’t even an original idea: Spohn had something similar in 2012.
I don’t think any of this stuff is interesting, or relevant for AI safety. There’s a pretty big literature on model robustness and algorithmic fairness that uses causal ideas.
If you want to worry about the end of the world, we have climate change, pandemics, and the rise of fascism.
Why did you give a talk on causal graphs if you didn’t think this kind of work was interesting or relevant? Maybe I’m misunderstanding what you’re saying isn’t interesting or relevant.
How much are you interested in a positive vs normative theory of counterfactuals? For example, do you feel like you understand how humans do counterfactual reasoning, and how and why it works for them (insofar as it works for them)? If not, is such an understanding what you’re looking for? Or do you think humans are not perfect at counterfactual reasoning (e.g. maybe because people disagree with each other about Newcomb’s problem etc.) and there’s some deep notion of “correct counterfactual reasoning” that humans are merely approximating, and the deeper “correct” thing is what you really care about?
(For my part I’m somewhat skeptical that there is a notion of counterfactuals that is fundamentally different from and better than what humans do.)
Update: I should further clarify that even though I provided a rough indication of how important I consider various approaches, this is off-the-cuff and I could be persuaded an approach was more valuable than I think, particularly if I saw good quality work.
I guess my ultimate interest is normative as the whole point of investigating this area is to figure out what we should do.
However, I am interested in descriptive theories insofar as they can contribute to this investigation (and not insofar as the details aren’t useful for normative theories). For example, when I say that counterfactuals only make sense from within the counterfactual perspective and further that counterfactuals are ultimately grounded as an evolutionary adaption I’m making descriptive statements. The latter seems to be more of a positive statement, while the former doesn’t seem to be (it seems to be justified by philosophical reasoning more than empirical investigation). In any case, it feels like there is more work to be done in taking these high-level abstract statements and making them more precise.
I think that further investigation here could be useful—although not in the sense that 40% use this style of reasoning and 60% use this style—exact percentages aren’t the relevant things here—at least not at this early stage. I’d also lean towards saying that how experts operate is more important than average humans and that the behavior of especially stupid humans is probably of limited importance.
I guess I see the behaviour of normal humans mattering for two reasons:
a) Firstly because I see making use of counterfactuals as evolutionarily grounded (in a more primitive form than the highly cognitive and mathematically influenced versions that we tend to use on LW)
b) Secondly because the experts are more likely to discard intuitions that don’t agree with their theories. And I think we need to use our reasoning to produce a consistent theory from our intuitions at some point, but this may be less than ideal if we’re simply trying to collect various intuitions as raw data to later turn into a theory.
I should clarify: in the above discussion, I’m commenting on what I’m interested in, rather than what’s in scope. The scope of the prize is the proposition that counterfactuals only make sense within themselves. And I guess part of what I was trying to clarify above is that empirical investigation can be relevant when carefully chosen. Happy to provide additional clarification if you were planning to submit a post covering something specific.
I guess my position on this is complex as I believe that counterfactuals only make sense in terms of themselves. So I don’t think there is a “true” notion of counterfatuals that exists within the ontology, rather I see them as a heuristic ultimately grounded by evolution. That said, our instinct to systematise and use logic to make things more coherent is also grounded in evolution.
People often hold vastly different perspectives on what counts as “fundamentally different” from something else. That said, I believe we should one-box on Newcomb’s problem (do you?) and I guess that seems fundamentally different from how humans who are trained on traditional decision theory/classical physics think. On the other hand, it may not be fundamentally different from how more untutored and instinctual individuals woudl behave. I guess I’d be curious where you stand here.
I think brains build a generative world-model, and that world-model is a certain kind of data structure, and “counterfactual reasoning” is a class of operations that can be performed on that data structure. (See here.) I think that counterfactual reasoning relates to reality only insofar as the world-model relates to reality. (In map-territory terminology: I think counterfactual reasoning is a set of things that you can do with the map, and those things are related to the territory only insofar as the map is related to the territory.)
I also think that there are lots of specific operations that are all “counterfactual reasoning” (just as there are lots of specific operations that are all “paying attention”—paying attention to what?), and once we do a counterfactual reasoning operation, there are also a lot of things that we can do with the result of the operation. I think that, over our lifetimes, we learn metacognitive heuristics that guide these decisions (i.e. exactly what “counterfactual reasoning”-type operations to do and when, and what to do with the result of the operation), and some people’s learned metacognitive heuristics are better than others (from the perspective of achieving such-and-such goal).
Analogy: If you show me a particular trained ConvNet that misclassifies a particular dog picture as a cat, I wouldn’t say that this reveals some deep truth about the nature of image classification, and I wouldn’t conclude that there is necessarily such a thing as a philosophically-better type of image classifier that fundamentally doesn’t ever make mistakes like that. (The brain image classifier makes mistakes too, albeit different mistakes than ConvNets make, but that’s besides the point.) Instead I would be more inclined to look for a very complicated explanation of the mistake, related to details of its training data and so on.
By the same token: if someone makes a poor decision on Newcomb’s problem, I don’t think that reveals some deep truth about the nature of counterfactual reasoning, and I wouldn’t conclude that there is necessarily such a thing as a philosophically-better type of counterfactual reasoning that fundamentally doesn’t ever make mistakes like that. Instead I would be more inclined to look for a very complicated explanation of the mistake, related to the person’s life history, exactly how Newcomb’s problem was explained to them, exactly what their learned world-model looks like, etc.
And if I wanted to build an AGI that performed well on Newcomb’s problem, I would build the AGI first, and then have the AGI read Eliezer’s essays or whatever, same as if I wanted my (human) friend to perform well on Newcomb’s problem. :-)
Agreed. This is definitely something that I would like further clarity on
I guess the real-world reasons for a mistake are sometimes not very philosophically insightful (ie. Bob was high when reading the post, James comes from a Spanish speaking background and they use their equivalent of a word differently than English-speakers, Sarah has a terrible memory and misremembered it)
I’m guessing like your position might be that there are just mistakes and there aren’t mistakes that are more philosophically fruitful or less fruitful? There’s just mistakes. Is that correct? Or were you just responding to my specific claim that it might be useful to know how the average person responds to problems because we are evolved creatures? If so, then I definitely agree that we’d have to delve into the details and not just remain on the level of averages.
Update: Actually, I’ll add an analogy that might be helpful. Let’s suppose you didn’t know what a dog was. Actually, that’s kind of the case: once you start diving into any definition you end up running into fuzzy cases, such as does a robotic dog count as a dog? Then if humans had built a bunch of different classifiers and you didn’t have access to the humans (say they went extinct) then you might want to analyse the different classifiers to try to figure out how humans defined the term dog, even though much of the behaviour might only tell you how the flaws tend to produce rather than about the human concept
Similarly, we don’t have exact access to our evolutionary history, but examining human intuitions about counterfactuals might provide insights about which heuristics have worked well, whilst also recognising that it’s hard, arguably impossible, to even talk about “working well” without embracing the notion of counterfactuals. And I agree that there are probably different ways we could emphasis various heuristics rather than a unique, principled solution.
I’m not claiming the situation is precisely this—in fact I’m not sure exactly how useful this analogy is—but I think it’s worth sharing anyway in case it lands.
Hmm, my hunch is that you’re misunderstanding me here. There are a lot of specific operations that are all “making a fist”. I can clench my fingers quickly or slowly, strongly or weakly, left hand or right hand, etc. By the same token, if I say to you “imagine a rainbow-colored tree; are its leaves green?”, there are a lot of different specific mental models that you might be invoking. (It could have horizontal rainbow stripes on the trunk, or it could have vertical rainbow stripes on its branches, etc.) All those different possibilities involve constructing a counterfactual mental model and querying it, in the same nuts-and-bolts way. I just meant, there are many possible counterfactual mental models that one can construct.
Suppose I ask “There’s a rainbow-colored tree somewhere in the world; are its leaves green?” You think for a second. What’s happening under the surface when you think about this? Inside your head are various different models pushing in different directions. Maybe there’s a model that says something like “rainbow-colored things tend to be rainbow-colored in all respects”. So maybe you’re visualizing a rainbow-colored tree, and querying the color of the leaves in that model, and this model is pushing on your visualized tree and trying to make it have a color scheme that’s compatible with the kinds of things you usually see, e.g. in cartoons, which would be rainbow-colored leaves. But there’s also a botany model that says “tree leaves tend to be green, because that’s the most effective for photosynthesis, although there are some exceptions like Japanese maples and autumn colors”. In scientifically-educated people, probably there will also be some metacognitive knowledge that principles of biology and photosynthesis are profound deep regularities in the world that are very likely to generalize , whereas color-scheme knowledge comes from cartoons etc. and is less likely to generalize.
So what’s at play is not “the nature of counterfactuals”, but the relative strengths of these three specific mental models (and many more besides) that are pushing in different directions. The way it shakes out will depend on the particular person and their life experience (and in particular, how much of a track-record of successful predictions these models have built up in similar contexts).
By the same token, I think every neurotypical human thinking about Newcomb’s problem is using counterfactual reasoning, and I think that there isn’t any interesting difference in the general nature of the counterfactual reasoning that they’re using. But the mental model of free will is different in different people, and the mental model of Omega is different in different people, etc.
Hmm, maybe we’re talking past each other a bit because of the learning-algorithm-vs-trained-model division. Understanding the learning algorithm is like being able to read and understand the the source code for a particular ML paper (and the PyTorch source code that it calls in turn). Understanding the trained model is like OpenAI microscope.
(It’s really “learning algorithm & inference algorithm”—the first changes the parameters, the second chooses what to do right now. I’m just calling it “learning algorithm” for short.)
I usually take the perspective that “the main event” is to understand the learning algorithm, because that’s what you need to build AGI, and that’s what the genome needs to build humans (thanks to within-lifetime learning), whereas understanding the trained model is “a sideshow”, unnecessary for building AGI, but still worth talking about for safety and whatnot.
On the “learning algorithm” side, I put “the basic capability to do counterfactual reasoning operations”. On the “trained model” side, I put all the learned heuristics about how reliable counterfactual reasoning is under what circumstance, and also all the learned concepts that go into a particular “counterfactual reasoning” operation (e.g. botany concepts, free will concepts, etc.)
Then when I brashly declare “I basically understand counterfactual reasoning”, I’m just talking about the stuff on the “learning algorithm” side. Whereas it seems that you feel like your project is to understand stuff on both sides—not only what a “counterfactual reasoning” operation is at a nuts-and-bolts level, but also all the other things that go into Newcomb’s problem, like whether there’s a “free will” concept in the world-model and what other concepts it’s connected to and how strongly (all of which can impact the results of a “counterfactual reasoning” operation). Then that research program seems to me to be more about normative decision theory and epistemology (e.g. “what to do in Newcomb’s problem”), rather than about the nature of counterfactual reasoning per se. Or I guess perhaps what you’re going for is closer to “practical advice that helps adult humans use counterfactual reasoning to reach correct conclusions”? In that case I’d be a bit surprised if there was much generically useful advice like that; I would expect that the main useful thing is object-level stuff like teaching better intuitions about the nature of free will etc.
I agree that there isn’t a single uniquely correct notion of a counterfactual. I’d say that we want different things from this notion and there are different ways to handle the trade-offs.
I find this confusing as CDT counterfactuals where you can only project forward seem very different from things like FDT where you can project back in time as well.
Well, we need the information encoded in our DNA rather than than what is actually implemented in humans (clarification: what is implemented in humans is significantly influenced by society) though we aren’t at the level where we can access that by analysing the DNA directly or people’s brain structure for that matter, so we have to reverse engineer it from behaviour
I’ve very much focused on trying to understand how to solve these problems in theory rather than how can we correct any cognitive flaws in humans or on how to adapt decision theory to be easier or more convenient to use.
In so far as I’m interested in how average humans reason counterfactually, it’s mostly about trying to understand the various heuristics that are the basis of counterfactuals. I guess I believe that we need counterfactuals to understand and evaluate these heuristics, but I guess I’m hoping that we can construct something reflexively consistent.
I think there is “machinery that underlies counterfactual reasoning” (which incidentally happens to be the same as “the machinery that underlies imagination”). My quote above was saying that every human deploys this machinery when you ask them a question about pretty much any topic.
I was initially assuming (by default) that if you’re trying to understand counterfactuals, you’re mainly trying to understand how this machinery works. But I’m increasingly confident that I was wrong, and that’s not in fact what you’re interested in. Instead it seems that your interests are more like “how would an AI, equipped with this kind of machinery, reach correct conclusions about the world?” (After all, the machinery by itself can lead to both correct and incorrect conclusions—just as “thinking / reasoning in general” can lead to correct or incorrect conclusions.)
Given what (I think) you’re trying to do above, I’m somewhat skeptical that you’ll make progress by thinking about the philosophical nature of counterfactuals in general. I don’t think there’s a clean separation between “good counterfactual reasoning” and “good reasoning in general”. If I say some counterfactual nonsense like “If the Earth were a flat disk, then the north pole would be in the center,” I think the reason it’s nonsense lives at the object-level, i.e. the detailed content of the thought in the context of everything else we know about the world. I don’t think the problem with that nonsense thought can be diagnosed at the meta-level, i.e. by examining structural properties of its construction as a counterfactual or whatever.
So by the same token, I think that “what counterfactuals make sense in the context of decision-making” is a decision theory question, not a counterfactuals question, and I expect a good answer to look like explicit discussions of decision theory as opposed to looking like a more general discussion of the philosophical nature of counterfactuals. (That said, the conclusion of that decision theory discussion could certainly look like a prescription on the content of counterfactual reasoning in a certain context, e.g. maybe the decision theory discussion concludes with ”...Therefore, when making decisions, use FDT-type counterfactuals” or whatever.)
I agree that counterfactual reasoning is contingent on certain brain structures, but I would say the same about logic as well and it’s clear that the logic of a kindergartener is very different from that of a logic professor—although perhaps we’re getting into a semantic debate—and what you mean is that the fundamental machinery is more or less the same.
Yeah, this seems accurate. I see understanding the machinery as the first step towards the goal of learning to counterfactually reason well. As an analogy, suppose you’re trying to learn how to reason well. It might make sense to figure out how humans reason, but if you want to build a better reasoning machine and not just duplicate human performance, you’d want to be able to identify some of these processes as good reasoning and some as biases.
I guess I don’t see why there would need to be a separation in order for the research direction I’ve suggested to be insightful. In fact, if there isn’t a separation, this direction could even be more fruitful as it could lead to rather general results.
I would say (as a slight simplification) that our goal in studying counterfactual reasoning should be to get counterfactuals to a point where we can answer questions about them using our normal reasoning.
That post certainly seems to contain an awful lot of philosophy to me. And I guess even though this post and my post On the Nature of Counterfactuals don’t make any reference to decision theory, that doesn’t mean that it isn’t in the background influencing what I write. I’ve written a lot of posts here, many of which discuss specific decision theory questions.
I guess I would still consider Joe Carlsmith’s post a high-quality post if it had focused exclusively on the more philosophical aspects. And I guess philosophical arguments are harder to evaluate than mathematical ones and it can be disconcerting for some people, especially those used to the certainty of mathematics, but I believe it’s possible to get to the level where you can avoid formalisation things a lot of the time because you have enough experience to know how things will shake out.
Although I suppose in this case my reason for avoiding formalisation is that I see premature formalisation as a critical error. Once someone has produced a formal theory they will feel psychologically compelled to defend it, especially if it mathematically beautiful, so I believe it’s important to be very careful about making sure the assumptions are right before attempting to formalise anything.
My entry. Ultimately I’m not sure whether I agree or disagree with your point, but I hope I’ve bought up some valuable things.
I’m not sure how strong you are in physics; the “Causality is real, counterfactuals are not” section is a brief summary of some fairly abstract and general properties of physics, so we might need to discuss it further in the comments if they do not immediately ring true to you.
Thanks for your submission. I’m still thinking about it, but I really appreciated how your entry engaged with the topic.
Yeah, I did at one point have a brief passing thought that you could make an argument along the lines you followed (that counterfactuals are a construction, but that they are built on top of underlying rules of the universe which have a real existence). Ideally, I would have thought through this line of thought before writing The Nature of Counterfactuals, but I lack the patience to spend a long time polishing before I release a post, so I mentally tagged it as something to think more about later.
I guess one reason why I might have tagged this as a “latter” thought is that I’m still trying to figure out my way around the debate between those who believe that the universe has laws vs. the more Humean perspective that things just are.
Thanks for developing this perspective. At the very least, it’ll provide a more solid target for me to engage with (vs. the vague intuition I had that an argument along these lines might be viable), but it’s also possible that I may come to agree with it after I’ve thought it through.
I’m not familiar with Hume’s philosophy, but the idea that “things just are” without being restricted to follow some patterns/laws seems to lose badly in a Bayesian way to theories which accept the laws that exist.
Perhaps, I’ve only heard them vaguely, second-hand, so I’m reluctant to take a position on this yet.
I’ve spent some hours yesterday writing an entry for this competition, but before publishing it I thought it might be best for me to try to talk briefly about my thoughts in a comment here. I think my post would go under this heading:
Specifically, I think I disagree with your thesis that it counterfactuals only make sense from a counterfactual perspective. Here’s a sketch of my reason (which I will go into more detail with later):
Humans evaluate our counterfactuals with our brains and experience. But we are agents that arise from a different optimization process, namely evolution. Evolution has used something like counterfactuals to design us, namely it has had a whole bunch of organisms with different lives live out, and then it has selected those that reproduced.
This gives an answer to “Why don’t agents construct crazy counterfactuals?”; those who did construct crazy counterfactuals did not reproduce, while those who constructed reasonable counterfactuals did reproduce. This is a counterfactual-independent fact of history (though understanding why this fact happens to be the case is probably easier if you do have counterfactuals).
In my post, I will give an argument that if you design an agent from one counterfactual perspective, then the optimally designed agent will inherit some of the counterfactual nature from your perspective and from the structure of reality.
Interestingly enough I simultaneously hold that both:
a) Counterfactuals only make sense from within themselves
b) Counterfactuals are grounded by being an evolutionary adaption
Given this, I just wanted to encourage you to make sure that you don’t assume that it must be a) OR b), not both, without arguing for these being mutually exclusive possibilities.
It’s possible that evolution may provide us with a notion of counterfactuals that aren’t recursively dependent upon themselves, although this would have to overcome the challenge of talking about evolution without invoking counterfactuals.
Anyway, looking forward to reading your post.
Hm, now I wonder if I should try to come up with a causally-incorrect account of evolutionary history that still makes the same distributional predictions that the causally-correct ones do. This seems like it could produce a perspective on how different counterfactual models would interpret the grounding of our counterfactuals.
Because ultimately evolution is just a feature of the universe that any theory must account for, whether it makes causally correct predictions or not.
Update: coming up with an alternative account of the causality involved in evolutionary history is actually… Really hard? Which of course is to be expected because I’m essentially trying to come up with a false theory that can account for a real phenomenon. But I think there might be something to be learned about the nature of causality from the difficulty of coming up with alternative causal explanations for evolutionary history, even though any set of mere observations should in theory be able to have an infinitude of causal explanations.
Aha, I think I’ve got it! Assuming “reasonable” theories, there is only one notion of causality that allows you to talk about the causal effects of an organism’s genetics. Lemme explain:
One way we could create incorrect causal accounts of evolution would be to break various physical symmetries. Just because a ball falls to the ground when I drop it does not mean it would have fallen to the ground when if it had been dropped elsewhere; thus maybe our universe is extremely causally unusual, because it just happens to “thread the needle” between an infinitude of states where the laws of nature would have been entirely different.
The above sort of approach would permit pretty much any kind of counterfactual, but it would also be completely unable to explain why our universe just happens to thread the needle so perfectly.
(One might imagine that one could explain it with a “common cause” model, since after all confounding is the big alternative to direct causal effects. However, the common cause would have to encode the entire trajectory of our universe, which is an enormous amount of information; this just makes the problem recursive, in that one then needs to come up with a causal model to explain this information.)
So an account which breaks the equational laws of physics needs to appeal to a leap of faith on the order of all of the complexity of the entire universe’s trajectory, which seems “unreasonable” to me—if nothing else, it doesn’t seem computationally viable to represent such accounts.
But the laws of physics can be seen as non-causal equations, rather than as causal effects; generally they’re directly reversible, and even when they are not, they are still bijective and volume-preserving. That is, you can take any physical state and extrapolate it backwards, not just forwards. And you can also take a complicated jumble of pieces of physicals states across different times, and find trajectories that trace through them.
So you could, for instance, pick the state of the universe right now, and consider the causal model that reverses time; your counterfactuals when changing a variable would yield the universe trajectory that ends up in the modified state, rather than the universe trajectory that results from the modified state. This only has two problems, a minor one and a major one.
The minor problem is that this would break thermodynamics and with it, make most counterfactuals useless; any “backwards counterfactual” would put you on a trajectory where the past is not lower entropy than the future, because the reason entropy is increasing is because we started out in low entropy and most states are high entropy; it’s not possible to apply a counterfactual to a state and still have it extrapolate backwards to something low-entropy. (… I believe?)
But the major problem is that this would break counterfactuals with respect to genes. That is, in this physically-backwards model, an organism’s genes are not determined by what it inherits at conception, but rather by an entropic common-cause conspiracy that happens as it “undecays from death” and its genes “magically” assemble in each of its cells. Since the genes at conception would now be a common consequence rather than a common cause of the organism, you can no longer talk about the effects on the organism of switching out its genes at conception.
But, you might think there is a solution to this: Instead of picking a timeslice or something similar to that, you could pick each organism’s genes at the moment of its conception (combined with some arbitrary extra info to make it pick out a unique universe trajectory), and have the universe grow causally from there.
I think this also soooorta breaks counterfactuals with respect to genes, but not as badly as before. Specifically, counterfactuals with respect to currently existing organism’s genes work just fine. But if you do a counterfactual with respect to them, then I think that would lead to there being new organisms not previously accounted for, and I think counterfactuals with respect to these new organism’s genes would be just as broken as if you had done backwards causality. So here, you end up with a symmetry broken; counterfactuals with respect to the original organisms end up working differently than counterfactuals with respect to the new organisms.
I’m kind of confused here. I can understand individual sentences, but not where you’re going as a whole. So your aim here is to figure out why causality is forwards and not backwards? If not, what do you mean by there only being one notion of causality that allows threading the needle?
I was thinking about the question “Why don’t agents construct crazy counterfactuals?”, and decided that I wanted a clearer idea of what crazy counterfactuals would look like in the case of evolution. As in, if you asked someone who had a crazy set of counterfactuals what would have happened if some organism had had some different DNA, what would they answer?
Okay, that makes more sense now! I’ll try to circle back and take a look at your original comment again when I have time.
I think perhaps one distinction that needs to be made is between “counterfactuals exist only in our imagination” and “causality exist only in our imagination”.
Counterfactuals definitely exist only in our imagination. We’re literally making up some modified version of the world, and then extrapolating its imaginary consequences.
Often, we might define causality in terms of counterfactuals; “X causes Y if Y has a counterfactual dependence on X”. So in that sense we might imagine that causality too only exists in our imagination.
But at least in the Pearlian paradigm, it’s actually the opposite way around. You start with some causal (dynamical) system, and then counterfactuals are defined to be made-up/”mutilated” versions of that system. The reason we use counterfactuals in the Pearlian paradigm is because they are a convenient interface for “querying” the aggregated properties of causality.
I’d argue that there is some real underlying causality that generates the universe. Though it’s easy to be comfused about this, because we do not have direct access to this causality; instead we always think about massively-simplified carricatural models, which boil the enormous complexity of reality down into something manageable.
Yeah, sounds like a plausible theory.
Oh hey, I already have slides for this.
Here you go: https://www.lesswrong.com/posts/vuvS2nkxn3ftyZSjz/what-is-a-counterfactual-an-elementary-introduction-to-the
I took the approach: if I very clearly explain what counterfactuals are and how to compute them, then it will be plain that there is no circularity. I attack the question more directly in a later paragraph, when I explain how counterfactual can be implemented in terms of two simpler operations: prediction and intervention. And that’s exactly how it is implemented in our causal probabilistic programming language, Omega (see http://www.zenna.org/Omega.jl/latest/ or https://github.com/jkoppel/omega-calculus ).
Unrelatedly, if you want to see some totally-sensible but arguably-circular definitions, see https://en.wikipedia.org/wiki/Impredicativity .
Hey Darmani, I enjoyed reading your post—it provides a very clear explanation of the three levels of the causal hierarchy—but it doesn’t seem to really engage with the issue of circularity.
I guess the potential circularity becomes important when we start asking the question of how to model taking different actions. After intervening on our decision node do we just project forward as per Causal Decision Theory or do we want to do something like Functional Decision Theory that allows back-projecting as well? If it’s the latter, how exactly do we determine what is subjunctively linked to what?
When trying to answer these questions, this naturally leads us to ask, “What exactly are these counterfactual things anyway?” and that path (in my opinion) leads to circularity.
These issues seem to occur even in situations when we know perfectly how to forwards predict and where we are given sufficient information that we don’t need to use abduction.
Anyway, thanks for your submission! I’m really happy to have at least one submission already.
I’m not surprised by this reaction, seeing as I jumped on banging it out rather than checking to make sure that I understand your confusion first. And I still don’t understand your confusion, so my best hope was giving a very clear, computational explanation of counterfactuals with no circularity in hopes it helps.
Anyway, let’s have some back and forth right here. I’m having trouble teasing apart the different threads of thought that I’m reading.
I think I’ll need to see some formulae to be sure I know what you’re talking about. I understand the core of decision theory to be about how to score potential actions, which seems like a pretty separate question from understanding counterfactuals.
More specifically, I understand that each decision theory provides two components: (1) a type of probabilistic model for modeling relevant scenarios, and (2) a probabilistic query that it says should be used to evaluate potential actions. Evidentiary decision theory uses an arbitrary probability distribution as its model, and evaluates actions by P(outcome |action). Causal decision theory uses a causal Bayes net (set of intervential distributions) and the query P(outcome | do(action)). I understand FDT less well, but basically view it as similar to CDT, except that it intervenes on the input to a decision procedure rather than on the output.
But all this is separate from the question of how to compute counterfactuals, and I don’t understand why you bring this up.
I still understand this to be the core of your question. Can you explain what questions remain about “what is a counterfactual” after reading my post?
While I can see this working in theory, in practise it’s more complicated as it isn’t obvious from immediate inspection to what extent an argument is or isn’t dependent on counterfactuals. I mean counterfactuals are everywhere! Part of the problem is that the clearest explanation of such a scheme would likely make use of counterfactuals, even if it were later shown that these aren’t necessary.
The best source for learning about FDT is this MIRI paper, but given its length, you might find the summary in this blog post answers your questions more quickly.
The key unanswered question (well, some people claim to have solutions) in Functional Decision theory is how to construct the logical counterfactuals that it depends on. What do I mean by logical counterfactuals? MIRI models agents as programs ie. logic so that imagining an agent taking an action other than it takes become imagining logic being such that a particular function provides a particular output on a given input than it does. Now I don’t quite agree with the logical counterfactuals framing, but I have been working on the question of constructing appropriate counterfactuals for this situation.
Is the explanation in the “What is a Counterfactual” post linked above circular?
Is the explanation in the post somehow not an explanation of counterfactuals?
I read a large chunk of the FDT paper while drafting my last comment.
The quoted sentence may hint at the root of the trouble that I and some others here seem to have in understanding what you want. You seem to be asking about the way “counterfactual” is used in a particular paper, not in general.
It is glossed over and not explained in full detail in the FDT paper, but it seems to mainly rely on extra constraints on allowable interventions, similar to the “super-specs” in one of my other papers: https://www.jameskoppel.com/files/papers/demystifying_dependence.pdf .
I’m going to go try to model Newcomb’s problem and some of the other FDT examples in Omega. If I’m successive, it’s evidence that there’s nothing more interesting going on than what’s in my causal hierarchy post.
Is the explanation in the post somehow not an explanation of counterfactuals?
Oh, it’s definitely an explanation of counterfactuals, but I wouldn’t say it’s a complete explanation of counterfactuals as it doesn’t handle exotic cases (ie Newcomb’s). I added some more background info after I posted the bounty and maybe I should have done that originally, but I posted the bounty on LW/alignment forum and that led me towards taking a certain background context as given, although I can now see that I should have clarified this originally.
Is the explanation in the “What is a Counterfactual” post linked above circular?
It seems that way, although maybe this circular dependence isn’t essential.
Take for example the concept of prediction. This seems to involve imagining different outcomes. How can we do this without counterfactuals?
I guess I have the same question with interventions. This seems to depend on the notion that we could intervene or we could not intervene. Only one of these can happen—the other is a counterfactual.
I don’t understand what counterfactuals have to do with Newcomb’s problem. You decide either “I am a one-boxer” or “I am a two-boxer,” the boxes get filled according to a rule, and then you pick deterministically according to a rule. It’s all forward reasoning; it’s just a bit weird because the action in question happens way before you are faced with the boxes. I don’t see any updating on a factual world to infer outcomes in a counterfactual world.
”Prediction” in this context is a synonym for conditioning.P(x|y) is defined as P(x,y)P(y).
If intervention sounds circular...I don’t know what to say other than read Chapter 1 of Pearl ( https://www.amazon.com/Causality-Reasoning-Inference-Judea-Pearl/dp/052189560X ).
To give a two-sentence technical explanation:
A structural causal model is a straight-line program with some random inputs. They look like this
It’s usually written with nodes and graphs, but they are equivalent to straight-line programs, and one can translate easily between these two presentations.
In the basic Pearl setup, an intervention consists of replacing one of the assignments above with an assignment to a constant. Here is an intervention setting the sprinkler off.
From this, one can easily compute thatP(wetgrass|do(sprinkler=false))=12.
If you want the technical development of counterfactuals that my post is based on, read Pearl Chapter 7, or Google around for the “twin network construction.”
Or I’ll just show you in code below how you compute the counterfactual “I see the sprinkler is on, so, if it hadn’t come on, the grass would not be wet,” which is written P(wet_grass|sprinkler=true,do(sprinkler=false))=0
We construct a new program,
This is now reduced to a pure statistical problem. Run this program a bunch of times, filter down to only the runs where sprinkler_factual is true, and you’ll find that wet_grass_counterfactual is false in all of them.
If you write this program as a dataflow graph, you see everything that happens after the intervention point being duplicated, but the background variables (the rain) are shared between them. This graph is the twin network, and this technique is called the “twin network construction.” It can also be thought of as what the do(y | x → e) operator is doing in our Omega language.
Everyone agrees what you should do if you can precommit. The question becomes philosophically interesting when an agent faces this problem without having had the opportunity to precommit.
Okay, I see how that technique of breaking circularity in the model looks like precommitment.
I still don’t see what this has to do with counterfactuals though.
“You decide either “I am a one-boxer” or “I am a two-boxer,” the boxes get filled according to a rule, and then you pick deterministically according to a rule. It’s all forward reasoning; it’s just a bit weird because the action in question happens way before you are faced with the boxes.”
So you wouldn’t class this as precommitment?
I realize now that this expressed as a DAG looks identical to precommitment.
Except, I also think it’s a faithful representation of the typical Newcomb scenario.
Paradox only arises if you can say “I am a two-boxer” (by picking up two boxes) while you were predicted to be a one-boxer. This can only happen if there are multiple nodes for two-boxing set to different values.
But really, this is a problem of the kind solved by superspecs in my Onward! paper. There is a constraint that the prediction of two-boxing must be the same as the actual two-boxing. Traditional causal DAGs can only express this by making them literally the same node; super-specs allow more flexibility. I am unclear how exactly it’s handled in FDT, but it has a similar analysis of the problem (“CDT breaks correlations”).
My entry. Focuses on the metaphysics of counterfactuals arguing that there are two types based upon two different possible states of a person’s mental model of causal relationships. This agrees with circularity. In general, I concur with principles 1-4 which you outline. My post hits on a bit of criteria a) b) and d).
https://www.lesswrong.com/posts/EvDsnqvmfnjdQbacb/circular-counterfactuals-only-that-which-happens-is-possible
Also, to the people who see everything confusing about counterfactuals as solved, this seems like a failure to ask new questions. If counterfactuals were “solved”, I would expect to be living in a world where would be no difficulty reverse engineering anything, the the theory and practice of prior formation would also be solved, decision theory would be unified into one model. We don’t live in that world.
I think there is still tons of fertile ground for thinking about the use of counterfactuals and we have not yet really scratched the surface of what’s possible.
Being solved at the level that philosophy operates doesn’t imply being solved at the engineering level.
You are right, of course. But even at the “level of philosophy” there are different levels, corridors, and extrapolations possible.
For example, it is not a question of engineering whether counterfactuals on chaotic systems are conditional predictions, or whether counterfactuals of different types of relationships have less necessary connection.
I’ll make a counter-claim and say that most people on LW in fact have rejected the use of Newcomb’s Problem as a test that will say something useful about decision theories.
That being said, there is definitely a sub-community which believes deeply in the relevance of Newcomb’s Problem as a test. This sub-community has historically created, and is still creating, a lot of traffic on this forum. This is to be expected: the people who reject Newcomb’s Problem do not tend to post about it that much.
Personally, I reject Newcomb’s Problem as a test.
I am also among the crowd who have posted explanations of Pearl Causality and Counterfactuals. My explanation here highlights the ‘using a different world model’ interpretation of Pearl’s counterfactual math, so it may in fact touch on your reframing:
Overall, reading the post and the comment section, I feel that, if I reject Newcomb’s Problem as a test, I can only ever write things that will not meet your prize criterion of usefully engaging with ‘circular dependency’.
I have a sense that with ‘circular dependency’ you are also pointing to a broader class of philosophical problems of ‘what does it mean for something to be true or correctly inferred’. If these were spelled out in detail, I also believe that I would end up rejecting the notion that we need to solve all these open problems definitively, the notion that these problems represent gaps in an agent foundations framework that still need to be filled, if the framework is to support AGI safety/alignment.
Firstly, I don’t see why that would interfere with evaluating possible arguments for and against circular dependency. It’s possible for an article to be here’s why these 3 reasons why we might think counterfactuals are circular are all false (not stating that an article would have to necessarily engage with 3 different arguments to win).
Secondly, I guess my issue with most of the attempts to say “use system X for counterfactuals” is that people seem to think merely not mentioning counterfactuals means that there isn’t a dependence on them. So there likely needs to be some part of such an article discussing why things that look counterfactual really aren’t.
I briefly skimmed your article and I’m sure if I read it further I’d learn something interesting, but merely as is it wouldn’t be on scope.
OK, so if I understand you correctly, you posit that there is something called ‘circular epistemology’. You said in the earlier post you link to at the top:
You further suspect that circular epistemology might have something useful to say about counterfactuals, in terms of offering a justification for them without ‘hitting a point where we can provide no justification at all’. And you have a bounty for people writing more about this.
Am I understanding you correctly?
Yeah, I believe epistemology to be inherently circular. I think it has some relation to counterfactuals being circular, but I don’t see it as quite the same as counterfactuals seem a lot harder to avoid using than most other concept. The point of mentioning circular epistemology was to persuade people that my theory isn’t as absurd as it sounds at first.
Wait, I was under the impression from the quoted text that you make a distinction between ‘circular epistemology’ and ‘other types of epistemology that will hit a point where we can provide no justification at all’. i.e. these other types are not circular because they are ultimately defined as a set of axioms, rewriting rules, and observational protocols for which no further justification is being attempted.
So I think I am still struggling to see what flavour of philosophical thought you want people to engage with, when you mention ‘circular’.
Mind you, I see ‘hitting a point where we provide no justification at all’ as a positive thing in a mathematical system, a physical theory, or an entire epistemology, as long as these points are clearly identified.
If you’re referring to the Wittgenstenian quote, I was merely quoting him, not endorsing his views.
Not aware of which part would be a Wittgenstenian quote. Long time ago that I read Wittgenstein, and I read him in German. In any case, I remain confused on what you mean with ‘circular’.
Hmm… Oh, I think that was elsewhere on this thread. Probably not to you. Eliezer’s Where Recursive Justification Hits Bottom seems to embrace a circular epistemology despite its title.
He doesn’t show much sign of embracing the validity of all circular argument ss, and neither do you.
I never said all circular arguments are valid
That doesn’t help. If recursive justification is a particular kind of circular argument that’s valid, so that others are invalid, then something makes it valid. But what? EY doesn’t say. And how do we know that the additional factor isn’t doing all the work?
??? I don’t follow. You meant to write “use system X instead of using system Y which calls itself a definition of counterfactuals ”?
What I mean is that some people seem to think that if they can describe a system that explains counterfactuals without mentioning counterfactuals when explaining them that they’ve avoided a circular dependence. When of course, we can’t just take things at face value, but have to dig deeper than that.
OK thanks for explaining. See my other recent reply for more thoughts about this.
I think this goes too far. We can give an account of counterfactuals from assumptions of symmetry. This account is unsatisfactory in many ways—for one thing, it implies that counterfactuals exist much more rarely than we want them to. Nonetheless, it seems to account for some properties of a counterfactual and is able to stand up without counterfactual assumptions to support it. I think it also provides an interesting lens for examining decision theory paradoxes.
What are you trying to get/do? I’m asking very seriously, as I can’t quite tell where we land between philosophy of language, human behaviour and cognition, AI architecture or some unification problem of them.
From philosophy of language perspective, I personally like to argue that hypotheticals in past tense are just wrong, but are used in the same way present and future tense versions are: expressing internal belief about how causality will play out for the sake of aligning them in a group.
I’m aware of other approaches, but that has a convenient property of bringing us to human (or biological as far as we know it) cognition, where mental models need to be expressed to get some value from social systems providing feedback—and a very fast, although limited in ways that create whole branches of our societies, way to do it is talking about causality in stories. Here counterfactuals are a special form of such stories—as a side note, I would be willing to argue that them being useful and ‘entertaining’ (as in ‘aesthetically pleasing, but not necessarily beautiful’) is a related phenomena.
I think both aspects can inform AI design, depending on where you want to put the ‘AI’ in the whole process. I’m not sure I’m optimistic about unification theory ;)
A post that attempts to evaluate the arguments for and against this principle would likely be more philosophical. A post that tried to draw out the practical consequences would tend to be more on the side of decision theory, though I expect it would involve delving into the philosophy as well.
I dont see why philosophy of language would tell you how reality works
There are at least three possibilities. David Lewis level realism, where counterfactual worlds seem fully real to their inhabitants, is an extreme. Moderate realism about counterfactuals is equivalent to indeterminism: only one thing happens, but it didn’t have to happen. And, absent any kind of realism, theres still logical counterfactuals.
Even if you accept the Kantian framework, it involves N>1 basic categories , so it doesn’t follow that any particular category has to apply to itself . (And if you accept the full Kantian framework, the problems don’t stop with counterfactuals).
Well, that’s two examples of circular dependency .
Regarding moderate realism, if what happened didn’t have to happen, then that implies that other things could have happened (these are counterfactuals). But this raises the question, what are these counterfactuals? You’ve already rejected Counterfactual Realism which seems to lead towards the two possibilities I suggested:
a) Counterfactuals are an inevitable and essential part of how we make sense of the world by our very nature
b) Counterfactuals are a semi-arbitrary and contingent system that we’ve adopted in order to navigate the world
(Some combination of the two is another possibility.)
Presumably, you don’t think moderate realism leads you down this path. Where do you think it leads instead?
“Even if you accept the Kantian framework, it involves N>1 basic categories”
Interesting point. I’m somewhat skeptical of this, but I wouldn’t completely rule it out either. (One thing I think plausible is that there could be a category A reducible to a category B which is then reducible back to A; but this wouldn’t avoid the circularity)
“Well, that’s two examples of circular dependency”—Yes, that’s what I said. I guess I’m confused why you’re repeating it
I haven’t rejected counterfactual realism. I’ve pointed out that Lewis’s modal realism doesn’t deal with counterfactuals as such, because it is a matter of perspective whether a world is factual (ie. contains me) or counterfactual (doesn’t).
What I have called moderate realism is the only position that holds counterfactuals to be both intrinsically counterfactual and real.
Kantianism about counterfactuals might be true, but if it is, you are also going to have problems with causality etc. There’s no special problem of counterfactuals.
That’s an odd thing to say. Kant lays out his categories, and there are more than one .
How so? I would have said the opposite.
Yeah, if Kantianism about counterfactuals were true, it would be strange to limit it. My expectation would be that it would apply to a bunch of other things as well.
Sorry, I should have been clearer. I wasn’t disagreeing with there being more than one category, but your conclusion from this.
I wasn’t saying that that is true per se, I was saying it’s Lewis’s view .
Well,if you think there is a special problem with counterfactuals , then needs a basis other than general Kantian issues.
Ah, okay. I get it now.
So, this post only deals with agent counterfactuals (not environmental counterfactuals), but I believe I have solved the technical issue you mention about the construction of logical counterfactuals as it concerns TDT. See: https://www.alignmentforum.org/posts/TnkDtTAqCGetvLsgr/a-possible-resolution-to-spurious-counterfactuals
I have fewer thoughts about environmental counterfactuals but think a similar approach could be used to make statements along those lines, i.e. construct alternate agents receiving a different observation about the world. I’m not sure any very specific technical problem exists with that, though—the TDT paper already talks about world model surgery.
I added a comment on the post directly, but I will add: we seem to roughly agree on counterfactuals existing in the imagination in a broad sense (I highlighted two ways this can go above—with counterfactuals being an intrinsic part of how we interact with the world or a pragmatic response to navigating the world). However, I think that following this through and asking why we care about them if they’re just in our imagination ends up taking us down a path where counterfactuals being circular seems plausible. On the other hand, you seem to think that this path takes us somewhere where there isn’t any circularity. Anyway, that’s the difference in our positions as far as I can tell from having just skimmed your link.
I was attempting to solve a relatively specific technical problem related to self-proofs using counterfactuals. So I suppose I do think (at least non-circular ones) are useful. But I’m not sure I’d commit to any broader philosophical statement about counterfactuals beyond “they can be used in a specific formal way to help functions prove statements about their own output in a way that avoid Lob’s Theorem issues”. That being said, that’s a pretty good use, if that’s the type of thing you want to do? It’s also not totally clear if you’re imagining counterfactuals the same way I am. I am using the English term because it matches the specific thing I’m describing decently well, but the term has a broad meaning, and without having an extremely specific imagining, it’s hard to make any more statements about what can be done with them.
Firstly, thanks for engaging with the circularity argument as there, unfortunately, hasn’t been much engagement with it on the thread.
I guess I don’t see a reason to reframe it like this. Your object to circularity is:
But that only makes sense if you’ve already reframed it. If I simply talk about circularity and avoid defining any concept in the circle as more or less fundamental, then that argument doesn’t get off the ground. So I guess it seems stronger to leave the framing as is since that dodges the argument you just provided.
I agree that asking the question involves assuming the existence of a lot more concepts, but why would this affect the claimed circularity of counterfactuals?
Perhaps they do, but I guess I’m challenging this by suggesting that counterfactuals only make sense from within the counterfactual perspective. Or reframing this, counterfactuals only make sense from a cognitive frame.
I don’t see this as connecting to the free will debate. “Because” assumes that humans have such a thing as a will, but there’s no requirement for it to be free.
I agree with this, although I can see why my position is confusing. I guess I believe both that:
a) Humans automatically make use of some intuitive notion or notions of counterfactuals
b) People interested in decision theory intentionally try to construct a more principled and consistent notion of counterfactuals
I guess it was the later question I was referring to when I was asking why humans construct counterfactuals.
I guess I’d roughly describe it as something that forms models of the world.
Well, this is why I proposed that counterfactuals only make sense from within the counterfactual view—by which I meant that when we try to explain what counterfactuals are we inevitably find ourselves making use of the notion of counterfactuals—but perhaps you think my framing/interpretation could be improved.
I think one thing that this discussion has highlighted is that I should be highlighting and paying more attention to the distinction between our primitive, intuitive notions of counterfactuals and the more formal notions that we construct.
I guess another thing I find myself wondering about upon reading this approach is how the notion of fundamentality fits into a circular epistemology. I think they are compatible—one way this could occur is if some notions are outside of the loop, but are contingent on concepts that do form such a loop. Unfortunately, this is much harder to explain just via text—ie. without a diagram.
I’m not 100% sure on the definition of coherentism, but I reject attempts to define truth in terms of coherence whilst also thinking that our epistemological process should be primarily about seeking coherence (I want to leave myself an out here to acknowledge that sometimes forcing coherence can take us further away from the truth).
I guess when we’re searching for coherence we need to make decision about which nodes we let update other nodes, so this seems to provide room for some nodes to be considered more foundational than other nodes.
I think of truth in terms of correspondence. Of course, we don’t actually have access to the territory.
Phenomenal experience with external reality.
Phenomenal experience is technically a subset of reality.