As a third data point, I used to be very confused about your ideas about causality, but your recent writing has helped a lot. To make embarassingly clear how very wrong I’ve been able to be, some years ago when you’d told us about TDT but not given details, I thought you had a fully worked-out and justified theory about how a decision agent could use causal graphs to model its uncertainty about the output of platonic computations, and use do() on its own output to compute the utility of different courses of action, and I got very frustrated when I simply couldn’t figure out how to fill in the details of that...
...hmm. (I should probably clarify: when I say “use causal graphs to reason about”, I don’t mean in the ‘trivial’ sense you are actually using where the platonic computations cause other things but are themselves uncaused in the model; I mean some sort of system where different computations and/or logical facts about computations form a non-degenerate graph, and where do() severs one node somewhere in the middle of that graph from its parents.) “And”, I was going to say, “when you finally did tell us more, I had a strong oh moment when you said that you still weren’t able to give a completely satisfying theory/justification, but were reasonably satisfied with the version you had. But I still continued to think that my picture of what you had been trying to do had been correct, only you didn’t have a fully worked-out theory of it, either.” The actual quote that turned into this memory of things seems to be,
Note that this does not solve the remaining open problems in TDT (though Nesov and Dai may have solved one such problem with their updateless decision theory). Also, although this theory goes into much more detail about how to compute its counterfactuals than classical CDT, there are still some visible incompletenesses when it comes to generating causal graphs that include the uncertain results of computations, computations dependent on other computations, computations uncertainly correlated to other computations, computations that reason abstractly about other computations without simulating them exactly, and so on.
But there’s also this:
The three-sentence version is: Factor your uncertainty over (impossible) possible worlds into a causal graph that includes nodes corresponding to the unknown outputs of known computations; condition on the known initial conditions of your decision computation to screen off factors influencing the decision-setup; compute the counterfactuals in your expected utility formula by surgery on the node representing the logical output of that computation.
And later:
Those of you who’ve read the quantum mechanics sequence can extrapolate from past experience that I’m not bluffing.
Huh. In retrospect I can see how this matches my current understanding of what you’re doing, but comparing this to what I wrote in the first paragraph above (before searching for that post), it’s actually surprisingly nonobvious where the difference is between what you wrote back then and what I wrote just now to explain the way in which I had horribly misunderstood you...
Anyway. As for what you wrote in the great-grandparent, I had to read it slowly, but most of it makes perfect sense to me; the last paragraph I’m not quite as sure about, but there too I think I understand what you mean.
There is, however, one major point on which I currently feel confused. You seem to be saying that causal reasoning should be seen as a very fundamental principle of epistemology, and on your list of open problems, you have “Better formalize hybrid of causal and mathematical inference.” But it seems to me that if you just do inference about logical uncertainty, and the mathematical object you happen to be interested in is a cellular automaton or the PDE giving the time evolution of some field theory, then your probability distribution over the state at different times will necessarily happen to factor in such a way that it can be represented as a causal model. So why treat causality as something fundamental in your epistemology, and then require deep thinking about how to integrate it with the rest of your reasoning system, rather than treating it as an efficient way to compress some probability distributions, which then just automatically happens to apply to the mathematical objects representing our actual physics? (At this point, I ask this question not as a criticism, but simply to illustrate my current confusion.)
So why treat causality as something fundamental in your epistemology, and then require deep
thinking about how to integrate it with the rest of your reasoning system, rather than treating it as
an efficient way to compress some probability distributions, which then just automatically
happens to apply to the mathematical objects representing our actual physics?
Because causality is not about efficiently encoding anything. A causal process a → b → c is equally efficiently encoded via c → b → a.
But it seems to me that if you just do inference about logical uncertainty, and the mathematical
object you happen to be interested in is a cellular automaton or the PDE giving the time evolution
of some field theory, then your probability distribution over the state at different times will
necessarily happen to factor in such a way that it can be represented as a causal model.
This is not true, for lots of reasons, one of them having to do with “observational equivalence.” A given causal graph has many different graphs with which it agrees on all observable constraints. All these other graphs are not causal. The 3 node chain above is one example.
Sorry, I understand the technical point about causal graphs you are refering to, but I do not understand the argument you’re trying to make with it in this context.
Suppose it’s the year 2100, and we have figured out the true underlying laws of physics, and it turns out that we run on a cellular automaton, and we have some very large and energy-intensive instruments that allow us to set up experiments where we can precisely set up the states of individual primitive cells. Now we want to use probabilistic reasoning to examine the time evolution of a cluster of such cells if we have only probabilistic information about the boundary conditions. Since this is a completely ordinary cellular automaton, we can describe it using a causal model, where the state of a cell at time t+1 is caused by its own state and the state of its neighbours at time t.
In this case, causality is really fundamentally there in the laws of physics (in a discrete analog of what we suspect for our actual laws of physics). And though you can’t reach in from the outside of the universe, it’s possible to imagine scenarios where you could do the equivalent of do() on some of the cells in your experiment, though it wouldn’t really be done by acausally changing what happens in the universe—one way to imagine it is that your experiment runs only in a two-dimensional slice surrounded by a “vacuum” of cells in a “zero” state, and you can reach in through that vacuum to change one of the cells in the two-dimensional grid.
But when it comes to how to model this inside a computer, it seems that you can reach all the conclusions you need by “ordinary” probabilistic reasoning: For example, you could start with say a uniform joint probability distribution over the state of all cells in your experiment at all times; then you condition on the fact that they fulfill the laws of physics, i.e. the time evolution rule of the cellular automaton; then you condition again on what you know about the boundary conditions, e.g. the fact that your experimental apparatus reaches in through the third dimension at some point to change the state of some cells. It’s extraordinarily inefficient to represent the joint distribution as a giant look-up table of probabilities, but I do not see what inferences you want but are going to lose by doing the calculations that way.
(All of this holds even if the true laws happen to be deterministic in only one direction in time, so that in your experiment you can distinguish a → b → c from c → b → a by reaching in through the third dimension at time b.)
It depends on granularity. If you are talking about your game of life world on the level of the rules of the game, that is equivalent to talking about our Universe on the level of the universal wave function. In both cases there are no more agents with actuators and no more do(.), as a result. That is, it’s not that your factorization will be causal, it’s that there is no causality.
But if you are taking a more granular view of your game of life world, similar to the macroscopic view of our Universe, where there are agents that can push and prod their environment, then suddenly talking about do(.) becomes useful for getting things done (just like it is useful to talk about addition or derivatives). On this macroscopic level, there is causality, but then your statement about all factorizations being causal is false (due to obvious examples involving reversing causal chains, for example).
As a third data point, I used to be very confused about your ideas about causality, but your recent writing has helped a lot. To make embarassingly clear how very wrong I’ve been able to be, some years ago when you’d told us about TDT but not given details, I thought you had a fully worked-out and justified theory about how a decision agent could use causal graphs to model its uncertainty about the output of platonic computations, and use do() on its own output to compute the utility of different courses of action, and I got very frustrated when I simply couldn’t figure out how to fill in the details of that...
...hmm. (I should probably clarify: when I say “use causal graphs to reason about”, I don’t mean in the ‘trivial’ sense you are actually using where the platonic computations cause other things but are themselves uncaused in the model; I mean some sort of system where different computations and/or logical facts about computations form a non-degenerate graph, and where do() severs one node somewhere in the middle of that graph from its parents.) “And”, I was going to say, “when you finally did tell us more, I had a strong oh moment when you said that you still weren’t able to give a completely satisfying theory/justification, but were reasonably satisfied with the version you had. But I still continued to think that my picture of what you had been trying to do had been correct, only you didn’t have a fully worked-out theory of it, either.” The actual quote that turned into this memory of things seems to be,
But there’s also this:
And later:
Huh. In retrospect I can see how this matches my current understanding of what you’re doing, but comparing this to what I wrote in the first paragraph above (before searching for that post), it’s actually surprisingly nonobvious where the difference is between what you wrote back then and what I wrote just now to explain the way in which I had horribly misunderstood you...
Anyway. As for what you wrote in the great-grandparent, I had to read it slowly, but most of it makes perfect sense to me; the last paragraph I’m not quite as sure about, but there too I think I understand what you mean.
There is, however, one major point on which I currently feel confused. You seem to be saying that causal reasoning should be seen as a very fundamental principle of epistemology, and on your list of open problems, you have “Better formalize hybrid of causal and mathematical inference.” But it seems to me that if you just do inference about logical uncertainty, and the mathematical object you happen to be interested in is a cellular automaton or the PDE giving the time evolution of some field theory, then your probability distribution over the state at different times will necessarily happen to factor in such a way that it can be represented as a causal model. So why treat causality as something fundamental in your epistemology, and then require deep thinking about how to integrate it with the rest of your reasoning system, rather than treating it as an efficient way to compress some probability distributions, which then just automatically happens to apply to the mathematical objects representing our actual physics? (At this point, I ask this question not as a criticism, but simply to illustrate my current confusion.)
Because causality is not about efficiently encoding anything. A causal process a → b → c is equally efficiently encoded via c → b → a.
This is not true, for lots of reasons, one of them having to do with “observational equivalence.” A given causal graph has many different graphs with which it agrees on all observable constraints. All these other graphs are not causal. The 3 node chain above is one example.
Sorry, I understand the technical point about causal graphs you are refering to, but I do not understand the argument you’re trying to make with it in this context.
Suppose it’s the year 2100, and we have figured out the true underlying laws of physics, and it turns out that we run on a cellular automaton, and we have some very large and energy-intensive instruments that allow us to set up experiments where we can precisely set up the states of individual primitive cells. Now we want to use probabilistic reasoning to examine the time evolution of a cluster of such cells if we have only probabilistic information about the boundary conditions. Since this is a completely ordinary cellular automaton, we can describe it using a causal model, where the state of a cell at time t+1 is caused by its own state and the state of its neighbours at time t.
In this case, causality is really fundamentally there in the laws of physics (in a discrete analog of what we suspect for our actual laws of physics). And though you can’t reach in from the outside of the universe, it’s possible to imagine scenarios where you could do the equivalent of do() on some of the cells in your experiment, though it wouldn’t really be done by acausally changing what happens in the universe—one way to imagine it is that your experiment runs only in a two-dimensional slice surrounded by a “vacuum” of cells in a “zero” state, and you can reach in through that vacuum to change one of the cells in the two-dimensional grid.
But when it comes to how to model this inside a computer, it seems that you can reach all the conclusions you need by “ordinary” probabilistic reasoning: For example, you could start with say a uniform joint probability distribution over the state of all cells in your experiment at all times; then you condition on the fact that they fulfill the laws of physics, i.e. the time evolution rule of the cellular automaton; then you condition again on what you know about the boundary conditions, e.g. the fact that your experimental apparatus reaches in through the third dimension at some point to change the state of some cells. It’s extraordinarily inefficient to represent the joint distribution as a giant look-up table of probabilities, but I do not see what inferences you want but are going to lose by doing the calculations that way.
(All of this holds even if the true laws happen to be deterministic in only one direction in time, so that in your experiment you can distinguish a → b → c from c → b → a by reaching in through the third dimension at time b.)
It depends on granularity. If you are talking about your game of life world on the level of the rules of the game, that is equivalent to talking about our Universe on the level of the universal wave function. In both cases there are no more agents with actuators and no more do(.), as a result. That is, it’s not that your factorization will be causal, it’s that there is no causality.
But if you are taking a more granular view of your game of life world, similar to the macroscopic view of our Universe, where there are agents that can push and prod their environment, then suddenly talking about do(.) becomes useful for getting things done (just like it is useful to talk about addition or derivatives). On this macroscopic level, there is causality, but then your statement about all factorizations being causal is false (due to obvious examples involving reversing causal chains, for example).