I also think that there are lots of specific operations that are all “counterfactual reasoning”
Agreed. This is definitely something that I would like further clarity on
Hmm, my hunch is that you’re misunderstanding me here. There are a lot of specific operations that are all “making a fist”. I can clench my fingers quickly or slowly, strongly or weakly, left hand or right hand, etc. By the same token, if I say to you “imagine a rainbow-colored tree; are its leaves green?”, there are a lot of different specific mental models that you might be invoking. (It could have horizontal rainbow stripes on the trunk, or it could have vertical rainbow stripes on its branches, etc.) All those different possibilities involve constructing a counterfactual mental model and querying it, in the same nuts-and-bolts way. I just meant, there are many possible counterfactual mental models that one can construct.
I’m guessing like your position might be that there are just mistakes and there aren’t mistakes that are more philosophically fruitful or less fruitful? There’s just mistakes. Is that correct?
Suppose I ask “There’s a rainbow-colored tree somewhere in the world; are its leaves green?” You think for a second. What’s happening under the surface when you think about this? Inside your head are various different models pushing in different directions. Maybe there’s a model that says something like “rainbow-colored things tend to be rainbow-colored in all respects”. So maybe you’re visualizing a rainbow-colored tree, and querying the color of the leaves in that model, and this model is pushing on your visualized tree and trying to make it have a color scheme that’s compatible with the kinds of things you usually see, e.g. in cartoons, which would be rainbow-colored leaves. But there’s also a botany model that says “tree leaves tend to be green, because that’s the most effective for photosynthesis, although there are some exceptions like Japanese maples and autumn colors”. In scientifically-educated people, probably there will also be some metacognitive knowledge that principles of biology and photosynthesis are profound deep regularities in the world that are very likely to generalize , whereas color-scheme knowledge comes from cartoons etc. and is less likely to generalize.
So what’s at play is not “the nature of counterfactuals”, but the relative strengths of these three specific mental models (and many more besides) that are pushing in different directions. The way it shakes out will depend on the particular person and their life experience (and in particular, how much of a track-record of successful predictions these models have built up in similar contexts).
By the same token, I think every neurotypical human thinking about Newcomb’s problem is using counterfactual reasoning, and I think that there isn’t any interesting difference in the general nature of the counterfactual reasoning that they’re using. But the mental model of free will is different in different people, and the mental model of Omega is different in different people, etc.
Hmm, maybe we’re talking past each other a bit because of the learning-algorithm-vs-trained-model division. Understanding the learning algorithm is like being able to read and understand the the source code for a particular ML paper (and the PyTorch source code that it calls in turn). Understanding the trained model is like OpenAI microscope.
(It’s really “learning algorithm & inference algorithm”—the first changes the parameters, the second chooses what to do right now. I’m just calling it “learning algorithm” for short.)
I usually take the perspective that “the main event” is to understand the learning algorithm, because that’s what you need to build AGI, and that’s what the genome needs to build humans (thanks to within-lifetime learning), whereas understanding the trained model is “a sideshow”, unnecessary for building AGI, but still worth talking about for safety and whatnot.
On the “learning algorithm” side, I put “the basic capability to do counterfactual reasoning operations”. On the “trained model” side, I put all the learned heuristics about how reliable counterfactual reasoning is under what circumstance, and also all the learned concepts that go into a particular “counterfactual reasoning” operation (e.g. botany concepts, free will concepts, etc.)
Then when I brashly declare “I basically understand counterfactual reasoning”, I’m just talking about the stuff on the “learning algorithm” side. Whereas it seems that you feel like your project is to understand stuff on both sides—not only what a “counterfactual reasoning” operation is at a nuts-and-bolts level, but also all the other things that go into Newcomb’s problem, like whether there’s a “free will” concept in the world-model and what other concepts it’s connected to and how strongly (all of which can impact the results of a “counterfactual reasoning” operation). Then that research program seems to me to be more about normative decision theory and epistemology (e.g. “what to do in Newcomb’s problem”), rather than about the nature of counterfactual reasoning per se. Or I guess perhaps what you’re going for is closer to “practical advice that helps adult humans use counterfactual reasoning to reach correct conclusions”? In that case I’d be a bit surprised if there was much generically useful advice like that; I would expect that the main useful thing is object-level stuff like teaching better intuitions about the nature of free will etc.
I just meant, there are many possible counterfactual mental models that one can construct.
I agree that there isn’t a single uniquely correct notion of a counterfactual. I’d say that we want different things from this notion and there are different ways to handle the trade-offs.
By the same token, I think every neurotypical human thinking about Newcomb’s problem is using counterfactual reasoning, and I think that there isn’t any interesting difference in the general nature of the counterfactual reasoning that they’re using.
I find this confusing as CDT counterfactuals where you can only project forward seem very different from things like FDT where you can project back in time as well.
I usually take the perspective that “the main event” is to understand the learning algorithm, because that’s what you need to build AGI, and that’s what the genome needs to build humans
Well, we need the information encoded in our DNA rather than than what is actually implemented in humans (clarification: what is implemented in humans is significantly influenced by society) though we aren’t at the level where we can access that by analysing the DNA directly or people’s brain structure for that matter, so we have to reverse engineer it from behaviour
Or I guess perhaps what you’re going for is closer to “practical advice that helps adult humans use counterfactual reasoning to reach correct conclusions”?
I’ve very much focused on trying to understand how to solve these problems in theory rather than how can we correct any cognitive flaws in humans or on how to adapt decision theory to be easier or more convenient to use.
In so far as I’m interested in how average humans reason counterfactually, it’s mostly about trying to understand the various heuristics that are the basis of counterfactuals. I guess I believe that we need counterfactuals to understand and evaluate these heuristics, but I guess I’m hoping that we can construct something reflexively consistent.
By the same token, I think every neurotypical human thinking about Newcomb’s problem is using counterfactual reasoning, and I think that there isn’t any interesting difference in the general nature of the counterfactual reasoning that they’re using.
I find this confusing as CDT counterfactuals where you can only project forward seem very different from things like FDT where you can project back in time as well.
I think there is “machinery that underlies counterfactual reasoning” (which incidentally happens to be the same as “the machinery that underlies imagination”). My quote above was saying that every human deploys this machinery when you ask them a question about pretty much any topic.
I was initially assuming (by default) that if you’re trying to understand counterfactuals, you’re mainly trying to understand how this machinery works. But I’m increasingly confident that I was wrong, and that’s not in fact what you’re interested in. Instead it seems that your interests are more like “how would an AI, equipped with this kind of machinery, reach correct conclusions about the world?” (After all, the machinery by itself can lead to both correct and incorrect conclusions—just as “thinking / reasoning in general” can lead to correct or incorrect conclusions.)
Given what (I think) you’re trying to do above, I’m somewhat skeptical that you’ll make progress by thinking about the philosophical nature of counterfactuals in general. I don’t think there’s a clean separation between “good counterfactual reasoning” and “good reasoning in general”. If I say some counterfactual nonsense like “If the Earth were a flat disk, then the north pole would be in the center,” I think the reason it’s nonsense lives at the object-level, i.e. the detailed content of the thought in the context of everything else we know about the world. I don’t think the problem with that nonsense thought can be diagnosed at the meta-level, i.e. by examining structural properties of its construction as a counterfactual or whatever.
So by the same token, I think that “what counterfactuals make sense in the context of decision-making” is a decision theory question, not a counterfactuals question, and I expect a good answer to look like explicit discussions of decision theory as opposed to looking like a more general discussion of the philosophical nature of counterfactuals. (That said, the conclusion of that decision theory discussion could certainly look like a prescription on the content of counterfactual reasoning in a certain context, e.g. maybe the decision theory discussion concludes with ”...Therefore, when making decisions, use FDT-type counterfactuals” or whatever.)
I think there is “machinery that underlies counterfactual reasoning”
I agree that counterfactual reasoning is contingent on certain brain structures, but I would say the same about logic as well and it’s clear that the logic of a kindergartener is very different from that of a logic professor—although perhaps we’re getting into a semantic debate—and what you mean is that the fundamental machinery is more or less the same.
I was initially assuming (by default) that if you’re trying to understand counterfactuals, you’re mainly trying to understand how this machinery works. But I’m increasingly confident that I was wrong, and that’s not in fact what you’re interested in. Instead it seems that your interests are more like “how would an AI, equipped with this kind of machinery, reach correct conclusions about the world?”
Yeah, this seems accurate. I see understanding the machinery as the first step towards the goal of learning to counterfactually reason well. As an analogy, suppose you’re trying to learn how to reason well. It might make sense to figure out how humans reason, but if you want to build a better reasoning machine and not just duplicate human performance, you’d want to be able to identify some of these processes as good reasoning and some as biases.
I don’t think there’s a clean separation between “good counterfactual reasoning” and “good reasoning in general”
I guess I don’t see why there would need to be a separation in order for the research direction I’ve suggested to be insightful. In fact, if there isn’t a separation, this direction could even be more fruitful as it could lead to rather general results.
If I say some counterfactual nonsense like “If the Earth were a flat disk, then the north pole would be in the center,” I think the reason it’s nonsense lives at the object-level, i.e. the detailed content of the thought in the context of everything else we know about the world
I would say (as a slight simplification) that our goal in studying counterfactual reasoning should be to get counterfactuals to a point where we can answer questions about them using our normal reasoning.
I think that “what counterfactuals make sense in the context of decision-making” is a decision theory question, not a counterfactuals question, and I expect a good answer to look like explicit discussions of decision theory as opposed to looking like a more general discussion of the philosophical nature of counterfactuals
That post certainly seems to contain an awful lot of philosophy to me. And I guess even though this post and my post On the Nature of Counterfactuals don’t make any reference to decision theory, that doesn’t mean that it isn’t in the background influencing what I write. I’ve written a lot of posts here, many of which discuss specific decision theory questions.
I guess I would still consider Joe Carlsmith’s post a high-quality post if it had focused exclusively on the more philosophical aspects. And I guess philosophical arguments are harder to evaluate than mathematical ones and it can be disconcerting for some people, especially those used to the certainty of mathematics, but I believe it’s possible to get to the level where you can avoid formalisation things a lot of the time because you have enough experience to know how things will shake out.
Although I suppose in this case my reason for avoiding formalisation is that I see premature formalisation as a critical error. Once someone has produced a formal theory they will feel psychologically compelled to defend it, especially if it mathematically beautiful, so I believe it’s important to be very careful about making sure the assumptions are right before attempting to formalise anything.
Hmm, my hunch is that you’re misunderstanding me here. There are a lot of specific operations that are all “making a fist”. I can clench my fingers quickly or slowly, strongly or weakly, left hand or right hand, etc. By the same token, if I say to you “imagine a rainbow-colored tree; are its leaves green?”, there are a lot of different specific mental models that you might be invoking. (It could have horizontal rainbow stripes on the trunk, or it could have vertical rainbow stripes on its branches, etc.) All those different possibilities involve constructing a counterfactual mental model and querying it, in the same nuts-and-bolts way. I just meant, there are many possible counterfactual mental models that one can construct.
Suppose I ask “There’s a rainbow-colored tree somewhere in the world; are its leaves green?” You think for a second. What’s happening under the surface when you think about this? Inside your head are various different models pushing in different directions. Maybe there’s a model that says something like “rainbow-colored things tend to be rainbow-colored in all respects”. So maybe you’re visualizing a rainbow-colored tree, and querying the color of the leaves in that model, and this model is pushing on your visualized tree and trying to make it have a color scheme that’s compatible with the kinds of things you usually see, e.g. in cartoons, which would be rainbow-colored leaves. But there’s also a botany model that says “tree leaves tend to be green, because that’s the most effective for photosynthesis, although there are some exceptions like Japanese maples and autumn colors”. In scientifically-educated people, probably there will also be some metacognitive knowledge that principles of biology and photosynthesis are profound deep regularities in the world that are very likely to generalize , whereas color-scheme knowledge comes from cartoons etc. and is less likely to generalize.
So what’s at play is not “the nature of counterfactuals”, but the relative strengths of these three specific mental models (and many more besides) that are pushing in different directions. The way it shakes out will depend on the particular person and their life experience (and in particular, how much of a track-record of successful predictions these models have built up in similar contexts).
By the same token, I think every neurotypical human thinking about Newcomb’s problem is using counterfactual reasoning, and I think that there isn’t any interesting difference in the general nature of the counterfactual reasoning that they’re using. But the mental model of free will is different in different people, and the mental model of Omega is different in different people, etc.
Hmm, maybe we’re talking past each other a bit because of the learning-algorithm-vs-trained-model division. Understanding the learning algorithm is like being able to read and understand the the source code for a particular ML paper (and the PyTorch source code that it calls in turn). Understanding the trained model is like OpenAI microscope.
(It’s really “learning algorithm & inference algorithm”—the first changes the parameters, the second chooses what to do right now. I’m just calling it “learning algorithm” for short.)
I usually take the perspective that “the main event” is to understand the learning algorithm, because that’s what you need to build AGI, and that’s what the genome needs to build humans (thanks to within-lifetime learning), whereas understanding the trained model is “a sideshow”, unnecessary for building AGI, but still worth talking about for safety and whatnot.
On the “learning algorithm” side, I put “the basic capability to do counterfactual reasoning operations”. On the “trained model” side, I put all the learned heuristics about how reliable counterfactual reasoning is under what circumstance, and also all the learned concepts that go into a particular “counterfactual reasoning” operation (e.g. botany concepts, free will concepts, etc.)
Then when I brashly declare “I basically understand counterfactual reasoning”, I’m just talking about the stuff on the “learning algorithm” side. Whereas it seems that you feel like your project is to understand stuff on both sides—not only what a “counterfactual reasoning” operation is at a nuts-and-bolts level, but also all the other things that go into Newcomb’s problem, like whether there’s a “free will” concept in the world-model and what other concepts it’s connected to and how strongly (all of which can impact the results of a “counterfactual reasoning” operation). Then that research program seems to me to be more about normative decision theory and epistemology (e.g. “what to do in Newcomb’s problem”), rather than about the nature of counterfactual reasoning per se. Or I guess perhaps what you’re going for is closer to “practical advice that helps adult humans use counterfactual reasoning to reach correct conclusions”? In that case I’d be a bit surprised if there was much generically useful advice like that; I would expect that the main useful thing is object-level stuff like teaching better intuitions about the nature of free will etc.
I agree that there isn’t a single uniquely correct notion of a counterfactual. I’d say that we want different things from this notion and there are different ways to handle the trade-offs.
I find this confusing as CDT counterfactuals where you can only project forward seem very different from things like FDT where you can project back in time as well.
Well, we need the information encoded in our DNA rather than than what is actually implemented in humans (clarification: what is implemented in humans is significantly influenced by society) though we aren’t at the level where we can access that by analysing the DNA directly or people’s brain structure for that matter, so we have to reverse engineer it from behaviour
I’ve very much focused on trying to understand how to solve these problems in theory rather than how can we correct any cognitive flaws in humans or on how to adapt decision theory to be easier or more convenient to use.
In so far as I’m interested in how average humans reason counterfactually, it’s mostly about trying to understand the various heuristics that are the basis of counterfactuals. I guess I believe that we need counterfactuals to understand and evaluate these heuristics, but I guess I’m hoping that we can construct something reflexively consistent.
I think there is “machinery that underlies counterfactual reasoning” (which incidentally happens to be the same as “the machinery that underlies imagination”). My quote above was saying that every human deploys this machinery when you ask them a question about pretty much any topic.
I was initially assuming (by default) that if you’re trying to understand counterfactuals, you’re mainly trying to understand how this machinery works. But I’m increasingly confident that I was wrong, and that’s not in fact what you’re interested in. Instead it seems that your interests are more like “how would an AI, equipped with this kind of machinery, reach correct conclusions about the world?” (After all, the machinery by itself can lead to both correct and incorrect conclusions—just as “thinking / reasoning in general” can lead to correct or incorrect conclusions.)
Given what (I think) you’re trying to do above, I’m somewhat skeptical that you’ll make progress by thinking about the philosophical nature of counterfactuals in general. I don’t think there’s a clean separation between “good counterfactual reasoning” and “good reasoning in general”. If I say some counterfactual nonsense like “If the Earth were a flat disk, then the north pole would be in the center,” I think the reason it’s nonsense lives at the object-level, i.e. the detailed content of the thought in the context of everything else we know about the world. I don’t think the problem with that nonsense thought can be diagnosed at the meta-level, i.e. by examining structural properties of its construction as a counterfactual or whatever.
So by the same token, I think that “what counterfactuals make sense in the context of decision-making” is a decision theory question, not a counterfactuals question, and I expect a good answer to look like explicit discussions of decision theory as opposed to looking like a more general discussion of the philosophical nature of counterfactuals. (That said, the conclusion of that decision theory discussion could certainly look like a prescription on the content of counterfactual reasoning in a certain context, e.g. maybe the decision theory discussion concludes with ”...Therefore, when making decisions, use FDT-type counterfactuals” or whatever.)
I agree that counterfactual reasoning is contingent on certain brain structures, but I would say the same about logic as well and it’s clear that the logic of a kindergartener is very different from that of a logic professor—although perhaps we’re getting into a semantic debate—and what you mean is that the fundamental machinery is more or less the same.
Yeah, this seems accurate. I see understanding the machinery as the first step towards the goal of learning to counterfactually reason well. As an analogy, suppose you’re trying to learn how to reason well. It might make sense to figure out how humans reason, but if you want to build a better reasoning machine and not just duplicate human performance, you’d want to be able to identify some of these processes as good reasoning and some as biases.
I guess I don’t see why there would need to be a separation in order for the research direction I’ve suggested to be insightful. In fact, if there isn’t a separation, this direction could even be more fruitful as it could lead to rather general results.
I would say (as a slight simplification) that our goal in studying counterfactual reasoning should be to get counterfactuals to a point where we can answer questions about them using our normal reasoning.
That post certainly seems to contain an awful lot of philosophy to me. And I guess even though this post and my post On the Nature of Counterfactuals don’t make any reference to decision theory, that doesn’t mean that it isn’t in the background influencing what I write. I’ve written a lot of posts here, many of which discuss specific decision theory questions.
I guess I would still consider Joe Carlsmith’s post a high-quality post if it had focused exclusively on the more philosophical aspects. And I guess philosophical arguments are harder to evaluate than mathematical ones and it can be disconcerting for some people, especially those used to the certainty of mathematics, but I believe it’s possible to get to the level where you can avoid formalisation things a lot of the time because you have enough experience to know how things will shake out.
Although I suppose in this case my reason for avoiding formalisation is that I see premature formalisation as a critical error. Once someone has produced a formal theory they will feel psychologically compelled to defend it, especially if it mathematically beautiful, so I believe it’s important to be very careful about making sure the assumptions are right before attempting to formalise anything.