As an additional data point, I also still do not have a very good understanding of your ideas about causality (although I did note earlier that it seems rather different from Pearl’s (which are similar to Ilya’s)). I also note that nobody else seems to have a good understanding of your ideas, at least not enough to try to build upon them either here on LW or on the decision theory mailing list or try to explain them to me when I asked.
Interesting. Sorry to bother you further, but can I ask you to quote a particular sentence or paragraph above that seems unclear? Or was the above clear, but it implies other questions that aren’t clear, or the motivations aren’t clear?
As a third data point, I used to be very confused about your ideas about causality, but your recent writing has helped a lot. To make embarassingly clear how very wrong I’ve been able to be, some years ago when you’d told us about TDT but not given details, I thought you had a fully worked-out and justified theory about how a decision agent could use causal graphs to model its uncertainty about the output of platonic computations, and use do() on its own output to compute the utility of different courses of action, and I got very frustrated when I simply couldn’t figure out how to fill in the details of that...
...hmm. (I should probably clarify: when I say “use causal graphs to reason about”, I don’t mean in the ‘trivial’ sense you are actually using where the platonic computations cause other things but are themselves uncaused in the model; I mean some sort of system where different computations and/or logical facts about computations form a non-degenerate graph, and where do() severs one node somewhere in the middle of that graph from its parents.) “And”, I was going to say, “when you finally did tell us more, I had a strong oh moment when you said that you still weren’t able to give a completely satisfying theory/justification, but were reasonably satisfied with the version you had. But I still continued to think that my picture of what you had been trying to do had been correct, only you didn’t have a fully worked-out theory of it, either.” The actual quote that turned into this memory of things seems to be,
Note that this does not solve the remaining open problems in TDT (though Nesov and Dai may have solved one such problem with their updateless decision theory). Also, although this theory goes into much more detail about how to compute its counterfactuals than classical CDT, there are still some visible incompletenesses when it comes to generating causal graphs that include the uncertain results of computations, computations dependent on other computations, computations uncertainly correlated to other computations, computations that reason abstractly about other computations without simulating them exactly, and so on.
But there’s also this:
The three-sentence version is: Factor your uncertainty over (impossible) possible worlds into a causal graph that includes nodes corresponding to the unknown outputs of known computations; condition on the known initial conditions of your decision computation to screen off factors influencing the decision-setup; compute the counterfactuals in your expected utility formula by surgery on the node representing the logical output of that computation.
And later:
Those of you who’ve read the quantum mechanics sequence can extrapolate from past experience that I’m not bluffing.
Huh. In retrospect I can see how this matches my current understanding of what you’re doing, but comparing this to what I wrote in the first paragraph above (before searching for that post), it’s actually surprisingly nonobvious where the difference is between what you wrote back then and what I wrote just now to explain the way in which I had horribly misunderstood you...
Anyway. As for what you wrote in the great-grandparent, I had to read it slowly, but most of it makes perfect sense to me; the last paragraph I’m not quite as sure about, but there too I think I understand what you mean.
There is, however, one major point on which I currently feel confused. You seem to be saying that causal reasoning should be seen as a very fundamental principle of epistemology, and on your list of open problems, you have “Better formalize hybrid of causal and mathematical inference.” But it seems to me that if you just do inference about logical uncertainty, and the mathematical object you happen to be interested in is a cellular automaton or the PDE giving the time evolution of some field theory, then your probability distribution over the state at different times will necessarily happen to factor in such a way that it can be represented as a causal model. So why treat causality as something fundamental in your epistemology, and then require deep thinking about how to integrate it with the rest of your reasoning system, rather than treating it as an efficient way to compress some probability distributions, which then just automatically happens to apply to the mathematical objects representing our actual physics? (At this point, I ask this question not as a criticism, but simply to illustrate my current confusion.)
So why treat causality as something fundamental in your epistemology, and then require deep
thinking about how to integrate it with the rest of your reasoning system, rather than treating it as
an efficient way to compress some probability distributions, which then just automatically
happens to apply to the mathematical objects representing our actual physics?
Because causality is not about efficiently encoding anything. A causal process a → b → c is equally efficiently encoded via c → b → a.
But it seems to me that if you just do inference about logical uncertainty, and the mathematical
object you happen to be interested in is a cellular automaton or the PDE giving the time evolution
of some field theory, then your probability distribution over the state at different times will
necessarily happen to factor in such a way that it can be represented as a causal model.
This is not true, for lots of reasons, one of them having to do with “observational equivalence.” A given causal graph has many different graphs with which it agrees on all observable constraints. All these other graphs are not causal. The 3 node chain above is one example.
Sorry, I understand the technical point about causal graphs you are refering to, but I do not understand the argument you’re trying to make with it in this context.
Suppose it’s the year 2100, and we have figured out the true underlying laws of physics, and it turns out that we run on a cellular automaton, and we have some very large and energy-intensive instruments that allow us to set up experiments where we can precisely set up the states of individual primitive cells. Now we want to use probabilistic reasoning to examine the time evolution of a cluster of such cells if we have only probabilistic information about the boundary conditions. Since this is a completely ordinary cellular automaton, we can describe it using a causal model, where the state of a cell at time t+1 is caused by its own state and the state of its neighbours at time t.
In this case, causality is really fundamentally there in the laws of physics (in a discrete analog of what we suspect for our actual laws of physics). And though you can’t reach in from the outside of the universe, it’s possible to imagine scenarios where you could do the equivalent of do() on some of the cells in your experiment, though it wouldn’t really be done by acausally changing what happens in the universe—one way to imagine it is that your experiment runs only in a two-dimensional slice surrounded by a “vacuum” of cells in a “zero” state, and you can reach in through that vacuum to change one of the cells in the two-dimensional grid.
But when it comes to how to model this inside a computer, it seems that you can reach all the conclusions you need by “ordinary” probabilistic reasoning: For example, you could start with say a uniform joint probability distribution over the state of all cells in your experiment at all times; then you condition on the fact that they fulfill the laws of physics, i.e. the time evolution rule of the cellular automaton; then you condition again on what you know about the boundary conditions, e.g. the fact that your experimental apparatus reaches in through the third dimension at some point to change the state of some cells. It’s extraordinarily inefficient to represent the joint distribution as a giant look-up table of probabilities, but I do not see what inferences you want but are going to lose by doing the calculations that way.
(All of this holds even if the true laws happen to be deterministic in only one direction in time, so that in your experiment you can distinguish a → b → c from c → b → a by reaching in through the third dimension at time b.)
It depends on granularity. If you are talking about your game of life world on the level of the rules of the game, that is equivalent to talking about our Universe on the level of the universal wave function. In both cases there are no more agents with actuators and no more do(.), as a result. That is, it’s not that your factorization will be causal, it’s that there is no causality.
But if you are taking a more granular view of your game of life world, similar to the macroscopic view of our Universe, where there are agents that can push and prod their environment, then suddenly talking about do(.) becomes useful for getting things done (just like it is useful to talk about addition or derivatives). On this macroscopic level, there is causality, but then your statement about all factorizations being causal is false (due to obvious examples involving reversing causal chains, for example).
On second thought, the main problem may not be lack of clarity but that your ideas about causality are too speculative and people either lack confidence that your research program (try to reduce Pearl’s do()-based causality to lower-level “causality in physics”) is the right one, or do not see how to proceed.
Both apply for me but the former is perhaps more relevant at this point. Basically I’m not sure that “do()-based causality” will actually end up playing a role in the ultimate “correct” decision theory (I guess if there is lack of clarity, it’s why you think that it will), and in the mean time there are other problems that definitely need to be solved and also seem more approachable.
(To explain why I think “do()-based causality” may not end up playing a role, it seems plausible that in an AI or at least decision theory (I wanted to say theoretical decision theory but that seems redundant :), cognition about “high-level causality” just ends up being handled as a special case by a more general algorithm, similar to how an AI programmed to maximize expected utility wouldn’t specifically need to be hand-coded with natural language processing if it was running on a sufficiently powerful computer.)
ETA: BTW, can you comment on whether my understanding in this comment was correct, and whether they still apply to Eliezer_2012?
You realize I’m arguing against do()-based causality? If not, I was very much unclearer than I thought.
I have never tried to reduce causal arrows to similarity; Barbour does, I don’t. I take causality to be, or be the epistemic conjugate of, something physical and real which was involved in manufacturing this oddly-well-modeled-by-causality universe that we actually live in. They are presently primitive in my model; I have not yet reduced them, except in the obvious sense that they are also formal mathematical relations between points, i.e., causal relations are a special case of logical relations (and yet we still live in a causal universe rather than a merely logical one). I do indeed reduce consciousness to computation and computation to causality, though there’s a step here involving magical reality-fluid about which I am still confused—I have no idea why or what it means for a causal process to be more or less real, either as a result of having more or less Born measure, being instantiated in many places, or for any other reason.
You realize I’m arguing against do()-based causality? If not, I was very much unclearer than I thought.
Maybe it’s just me not updating fast enough. My impression is that when you talked about causality prior to today, you usually mentioned Pearl and never said you disagreed with him on anything, so I assumed you wanted to keep his do()-based causality and just add a layer below it. Were you always against do()-based causality or did you change your mind at some point?
I have never tried to reduce causal arrows to similarity; Barbour does, I don’t.
Hmm, re-reading Timeless Causality, I don’t see how I could have learned that the idea belongs to Barbour and that you disagree with him. It sure sounds like it was your idea.
causal relations are a special case of logical relations (and yet we still live in a causal universe rather than a merely logical one)
Why should we care about causality as decision theorists, if we have decision theories that can deal with logical universes in general, and causal relations are just a special case of logical relations?
Hmm, re-reading Timeless Causality, I don’t see how I could have learned that the idea belongs to Barbour and that you disagree with him. It sure sounds like it was your idea.
This sounds like a high-priority problem, but actually I don’t see any reference to reduction-to-similarity in Timeless Causality, although there’s a lot in Barbour’s book about it. What do you mean by “mind reduces to computation which reduces to causal arrows which reduces to some sort of similarity relationship between configurations”? Unless this is just in the sense that causal mechanisms are logical relations?
I interpreted this paragraph as sugesting that causality reduces to similarity, but given your latest clarifications, I guess what you actually had in mind was that causality tends to produce similarity and so we can infer causality from similarity.
When two regions of spacetime are timelike separated, we cannot deduce any direction of causality from similarities between them; they could be similar because one is cause and one is effect, or vice versa. But when two regions of spacetime are spacelike separated, and far enough apart that they have no common causal ancestry assuming one direction of physical causality, but would have common causal ancestry assuming a different direction of physical causality, then similarity between them… is at least highly suggestive.
Previously, I thought you considered causality to be a higher level concept rather than a primitive one, similar to “sound waves” or “speech” as opposed to say “particle movements”. That sort of made sense except that I didn’t know why you wanted to make causality an integral part of decision theory. Now you’re saying that you consider causality to be primitive and a special kind of logical relations, which actually makes less sense to me, and still doesn’t explain why you want to make causality an integral part of decision theory. It makes less sense because if we consider the laws of physics as logical relations, they don’t have a direction. As you said, “Time-symmetrical laws of physics didn’t seem to leave room for asymmetrical causality.” I don’t see how you get around this problem if you take causality to be primitive. But the bigger problem is that (at the risk of repeating myself too many times) I don’t understand your motivation for studying causality, because if I did I’d probably spend more time thinking about it mysef and understand your ideas about it better.
I’m trying to think like reality. If causality isn’t a special kind of logic, why is everything in the known universe made out of (a continuous analogue of) causality instead of logic in general? Why not Time-Turners or a zillion other possibilities?
If causality isn’t a special kind of logic, why is everything in the known universe made out of (a continuous analogue of) causality instead of logic in general?
Wait, if causality is a special kind of logic, how does that help answer the question? Don’t we still have to answer why the universe is made of this kind of logical instead of some other?
Why not Time-Turners or a zillion other possibilities?
I don’t understand how lack of Time-Turners makes you think causality is a special kind of logic or why you want to incorporate causality into decision theory (which is still my bigger question). Similar questions could be asked about other features of the universe:
Why does the universe have 3 spatial dimensions instead of a zillion other possibilities?
Why doesn’t the laws of physics allow information to be destroyed (i.e., never maps 2 different states at time t to the same state at time t+1)?
But we’re not concerned about these questions at the level of decision theory, since it seems possible to have a decision theory that works with an arbitrary number of dimensions, and with both kinds of laws of physics. Similarly, I don’t see why we can’t have a “causality-agnostic” decision theory that works in universes both with and without Time-Turners.
I think the point was more about whether causality should be thought of as a fundamental part of the rules, like this, or whether it’s more useful to think of causality as an abstraction that (ahem, excuse the term) “emerges” from the fundamentals when we try to identify patterns in said fundamentals.
Somewhat akin to how “meaning” exists in a computer program despite none of the bits fundamentally meaning anything, I think. My thoughts are becoming more and more confused as I type, though, which makes me wish I had an environment suitable to better concentration.
You realize I’m arguing against do()-based causality?
Ok, I would like to state for the record that I no longer understand what you mean when you say “factor something as a causal graph” (which may well mean no one else on this site understands either). Basically everything you ever wrote on the subject of causality or causal graphs (other than exposition of standard material) is now a complete mystery to me. In particular, I don’t understand what sorts of graphs are in your paper on the Newcomb’s problem, or why those graphs justify you to make any sorts of conclusions about Newcomb’s problem.
Graph models are overloaded, there are lots of different models that all have the same graph. You have to explain what you mean if you use graphs.
As an additional data point, I also still do not have a very good understanding of your ideas about causality (although I did note earlier that it seems rather different from Pearl’s (which are similar to Ilya’s)). I also note that nobody else seems to have a good understanding of your ideas, at least not enough to try to build upon them either here on LW or on the decision theory mailing list or try to explain them to me when I asked.
Interesting. Sorry to bother you further, but can I ask you to quote a particular sentence or paragraph above that seems unclear? Or was the above clear, but it implies other questions that aren’t clear, or the motivations aren’t clear?
As a third data point, I used to be very confused about your ideas about causality, but your recent writing has helped a lot. To make embarassingly clear how very wrong I’ve been able to be, some years ago when you’d told us about TDT but not given details, I thought you had a fully worked-out and justified theory about how a decision agent could use causal graphs to model its uncertainty about the output of platonic computations, and use do() on its own output to compute the utility of different courses of action, and I got very frustrated when I simply couldn’t figure out how to fill in the details of that...
...hmm. (I should probably clarify: when I say “use causal graphs to reason about”, I don’t mean in the ‘trivial’ sense you are actually using where the platonic computations cause other things but are themselves uncaused in the model; I mean some sort of system where different computations and/or logical facts about computations form a non-degenerate graph, and where do() severs one node somewhere in the middle of that graph from its parents.) “And”, I was going to say, “when you finally did tell us more, I had a strong oh moment when you said that you still weren’t able to give a completely satisfying theory/justification, but were reasonably satisfied with the version you had. But I still continued to think that my picture of what you had been trying to do had been correct, only you didn’t have a fully worked-out theory of it, either.” The actual quote that turned into this memory of things seems to be,
But there’s also this:
And later:
Huh. In retrospect I can see how this matches my current understanding of what you’re doing, but comparing this to what I wrote in the first paragraph above (before searching for that post), it’s actually surprisingly nonobvious where the difference is between what you wrote back then and what I wrote just now to explain the way in which I had horribly misunderstood you...
Anyway. As for what you wrote in the great-grandparent, I had to read it slowly, but most of it makes perfect sense to me; the last paragraph I’m not quite as sure about, but there too I think I understand what you mean.
There is, however, one major point on which I currently feel confused. You seem to be saying that causal reasoning should be seen as a very fundamental principle of epistemology, and on your list of open problems, you have “Better formalize hybrid of causal and mathematical inference.” But it seems to me that if you just do inference about logical uncertainty, and the mathematical object you happen to be interested in is a cellular automaton or the PDE giving the time evolution of some field theory, then your probability distribution over the state at different times will necessarily happen to factor in such a way that it can be represented as a causal model. So why treat causality as something fundamental in your epistemology, and then require deep thinking about how to integrate it with the rest of your reasoning system, rather than treating it as an efficient way to compress some probability distributions, which then just automatically happens to apply to the mathematical objects representing our actual physics? (At this point, I ask this question not as a criticism, but simply to illustrate my current confusion.)
Because causality is not about efficiently encoding anything. A causal process a → b → c is equally efficiently encoded via c → b → a.
This is not true, for lots of reasons, one of them having to do with “observational equivalence.” A given causal graph has many different graphs with which it agrees on all observable constraints. All these other graphs are not causal. The 3 node chain above is one example.
Sorry, I understand the technical point about causal graphs you are refering to, but I do not understand the argument you’re trying to make with it in this context.
Suppose it’s the year 2100, and we have figured out the true underlying laws of physics, and it turns out that we run on a cellular automaton, and we have some very large and energy-intensive instruments that allow us to set up experiments where we can precisely set up the states of individual primitive cells. Now we want to use probabilistic reasoning to examine the time evolution of a cluster of such cells if we have only probabilistic information about the boundary conditions. Since this is a completely ordinary cellular automaton, we can describe it using a causal model, where the state of a cell at time t+1 is caused by its own state and the state of its neighbours at time t.
In this case, causality is really fundamentally there in the laws of physics (in a discrete analog of what we suspect for our actual laws of physics). And though you can’t reach in from the outside of the universe, it’s possible to imagine scenarios where you could do the equivalent of do() on some of the cells in your experiment, though it wouldn’t really be done by acausally changing what happens in the universe—one way to imagine it is that your experiment runs only in a two-dimensional slice surrounded by a “vacuum” of cells in a “zero” state, and you can reach in through that vacuum to change one of the cells in the two-dimensional grid.
But when it comes to how to model this inside a computer, it seems that you can reach all the conclusions you need by “ordinary” probabilistic reasoning: For example, you could start with say a uniform joint probability distribution over the state of all cells in your experiment at all times; then you condition on the fact that they fulfill the laws of physics, i.e. the time evolution rule of the cellular automaton; then you condition again on what you know about the boundary conditions, e.g. the fact that your experimental apparatus reaches in through the third dimension at some point to change the state of some cells. It’s extraordinarily inefficient to represent the joint distribution as a giant look-up table of probabilities, but I do not see what inferences you want but are going to lose by doing the calculations that way.
(All of this holds even if the true laws happen to be deterministic in only one direction in time, so that in your experiment you can distinguish a → b → c from c → b → a by reaching in through the third dimension at time b.)
It depends on granularity. If you are talking about your game of life world on the level of the rules of the game, that is equivalent to talking about our Universe on the level of the universal wave function. In both cases there are no more agents with actuators and no more do(.), as a result. That is, it’s not that your factorization will be causal, it’s that there is no causality.
But if you are taking a more granular view of your game of life world, similar to the macroscopic view of our Universe, where there are agents that can push and prod their environment, then suddenly talking about do(.) becomes useful for getting things done (just like it is useful to talk about addition or derivatives). On this macroscopic level, there is causality, but then your statement about all factorizations being causal is false (due to obvious examples involving reversing causal chains, for example).
On second thought, the main problem may not be lack of clarity but that your ideas about causality are too speculative and people either lack confidence that your research program (try to reduce Pearl’s do()-based causality to lower-level “causality in physics”) is the right one, or do not see how to proceed.
Both apply for me but the former is perhaps more relevant at this point. Basically I’m not sure that “do()-based causality” will actually end up playing a role in the ultimate “correct” decision theory (I guess if there is lack of clarity, it’s why you think that it will), and in the mean time there are other problems that definitely need to be solved and also seem more approachable.
(To explain why I think “do()-based causality” may not end up playing a role, it seems plausible that in an AI or at least decision theory (I wanted to say theoretical decision theory but that seems redundant :), cognition about “high-level causality” just ends up being handled as a special case by a more general algorithm, similar to how an AI programmed to maximize expected utility wouldn’t specifically need to be hand-coded with natural language processing if it was running on a sufficiently powerful computer.)
ETA: BTW, can you comment on whether my understanding in this comment was correct, and whether they still apply to Eliezer_2012?
You realize I’m arguing against do()-based causality? If not, I was very much unclearer than I thought.
I have never tried to reduce causal arrows to similarity; Barbour does, I don’t. I take causality to be, or be the epistemic conjugate of, something physical and real which was involved in manufacturing this oddly-well-modeled-by-causality universe that we actually live in. They are presently primitive in my model; I have not yet reduced them, except in the obvious sense that they are also formal mathematical relations between points, i.e., causal relations are a special case of logical relations (and yet we still live in a causal universe rather than a merely logical one). I do indeed reduce consciousness to computation and computation to causality, though there’s a step here involving magical reality-fluid about which I am still confused—I have no idea why or what it means for a causal process to be more or less real, either as a result of having more or less Born measure, being instantiated in many places, or for any other reason.
Maybe it’s just me not updating fast enough. My impression is that when you talked about causality prior to today, you usually mentioned Pearl and never said you disagreed with him on anything, so I assumed you wanted to keep his do()-based causality and just add a layer below it. Were you always against do()-based causality or did you change your mind at some point?
Hmm, re-reading Timeless Causality, I don’t see how I could have learned that the idea belongs to Barbour and that you disagree with him. It sure sounds like it was your idea.
Why should we care about causality as decision theorists, if we have decision theories that can deal with logical universes in general, and causal relations are just a special case of logical relations?
This sounds like a high-priority problem, but actually I don’t see any reference to reduction-to-similarity in Timeless Causality, although there’s a lot in Barbour’s book about it. What do you mean by “mind reduces to computation which reduces to causal arrows which reduces to some sort of similarity relationship between configurations”? Unless this is just in the sense that causal mechanisms are logical relations?
I interpreted this paragraph as sugesting that causality reduces to similarity, but given your latest clarifications, I guess what you actually had in mind was that causality tends to produce similarity and so we can infer causality from similarity.
Previously, I thought you considered causality to be a higher level concept rather than a primitive one, similar to “sound waves” or “speech” as opposed to say “particle movements”. That sort of made sense except that I didn’t know why you wanted to make causality an integral part of decision theory. Now you’re saying that you consider causality to be primitive and a special kind of logical relations, which actually makes less sense to me, and still doesn’t explain why you want to make causality an integral part of decision theory. It makes less sense because if we consider the laws of physics as logical relations, they don’t have a direction. As you said, “Time-symmetrical laws of physics didn’t seem to leave room for asymmetrical causality.” I don’t see how you get around this problem if you take causality to be primitive. But the bigger problem is that (at the risk of repeating myself too many times) I don’t understand your motivation for studying causality, because if I did I’d probably spend more time thinking about it mysef and understand your ideas about it better.
I’m trying to think like reality. If causality isn’t a special kind of logic, why is everything in the known universe made out of (a continuous analogue of) causality instead of logic in general? Why not Time-Turners or a zillion other possibilities?
Wait, if causality is a special kind of logic, how does that help answer the question? Don’t we still have to answer why the universe is made of this kind of logical instead of some other?
I don’t understand how lack of Time-Turners makes you think causality is a special kind of logic or why you want to incorporate causality into decision theory (which is still my bigger question). Similar questions could be asked about other features of the universe:
Why does the universe have 3 spatial dimensions instead of a zillion other possibilities?
Why doesn’t the laws of physics allow information to be destroyed (i.e., never maps 2 different states at time t to the same state at time t+1)?
But we’re not concerned about these questions at the level of decision theory, since it seems possible to have a decision theory that works with an arbitrary number of dimensions, and with both kinds of laws of physics. Similarly, I don’t see why we can’t have a “causality-agnostic” decision theory that works in universes both with and without Time-Turners.
I think the point was more about whether causality should be thought of as a fundamental part of the rules, like this, or whether it’s more useful to think of causality as an abstraction that (ahem, excuse the term) “emerges” from the fundamentals when we try to identify patterns in said fundamentals.
Somewhat akin to how “meaning” exists in a computer program despite none of the bits fundamentally meaning anything, I think. My thoughts are becoming more and more confused as I type, though, which makes me wish I had an environment suitable to better concentration.
Ok, I would like to state for the record that I no longer understand what you mean when you say “factor something as a causal graph” (which may well mean no one else on this site understands either). Basically everything you ever wrote on the subject of causality or causal graphs (other than exposition of standard material) is now a complete mystery to me. In particular, I don’t understand what sorts of graphs are in your paper on the Newcomb’s problem, or why those graphs justify you to make any sorts of conclusions about Newcomb’s problem.
Graph models are overloaded, there are lots of different models that all have the same graph. You have to explain what you mean if you use graphs.