Agree it’s not totally right to call this a causal relationship.
That said:
The contents of 3 envelopes does seems causally upstream of the contents of 10 envelopes
If Alice’s perception is imperfect (in any possible world), then “what Alice perceived” is not identical to “the contents of 3 envelopes” and so is not strictly before “what Bob perceived” (unless there is some other relationship between them).
If Alice’s perception is perfect in every possible world, then there is no possible way to intervene on Alice’s perception without intervening on the contents of the 3 envelopes. So it seems like a lot rests on whether you are restricting your attention to possible worlds.
Even if Alice’s perception is perfect (or if Bob is guaranteed to tell Alice the contents of the 3 envelopes) we can still imagine an intervention on Alice’s perception, and in your stories it seems like that’s what makes it feel like Alice’s perception isn’t upstream of Bob’s perception. But it feels to me like this imagination ought to track subjective possibility, even if in fact it is probably logically necessary that Alice perceives correctly / Bob reports correctly / whatever.
So I do feel like there’s a case to be made that it captures everything we should care about with respect to causality.
For example, it seems unlikely to me that decision theory should depend on what happens in obviously impossible worlds. If we want to depend on impossible worlds it seems like it will usually happen by introducing a more naive epistemic state from which those worlds are subjectively possible—in which case we can talk about the FFS definition with respect to that epistemic state.
(I have no idea if this perspective is endorsed by Scott or if it would stand up to scrutiny.)
I think I (at least locally) endorse this view, and I think it is also a good pointer to what seems to me to be the largest crux between the my theory of time and Pearl’s theory of time.
I feel that interpreting “strictly before” as causality is making me more confused.
For example, here’s a scenario with a randomly changed message. Bob peeks at ten regular envelopes and a special envelope that gives him a random boolean. Then Bob tells Alice the contents of either the first three envelopes or the second three, depending on the boolean. Now Alice’s knowledge depends on six out of ten regular envelopes and the special one, so it’s still “strictly before” Bob’s knowledge. And since Alice’s knowledge can be computed from Bob’s knowledge but not vice versa, in FFS terms that means the “cause” can be (and in fact is) computed from the “effect”, but not vice versa. My causal intuition is just blinking at all this.
Here’s another scenario. Alice gets three regular envelopes and accurately reports their contents to Bob, and a special envelope that she keeps to herself. Then Bob peeks at seven more envelopes. Now Alice’s knowledge isn’t “before” Bob’s, but if later Alice predictably forgets the contents of her special envelope, her knowledge becomes “before” Bob’s. Even though the special envelope had no effect on the information Alice gave to Bob, didn’t affect the causal arrow in any possible world. And if we insist that FFS=causality, then by forgetting the envelope, Alice travels back in time to become the cause of Bob’s knowledge in the past. That’s pretty exotic.
I partially agree, which is partially why I am saying time rather than causality.
I still feel like there is an ontological disagreement in that it feels like you are objecting to saying the physical thing that is Alice’s knowledge is (not) before the physical thing that is Bob’s knowledge.
In my ontology: 1) the information content of Alice’s knowledge is before the information content of Bob’s knowledge. (I am curios if this part is controversial.)
and then,
2) there is in some sense no more to say about the physical thing that is e.g. Alice’s knowledge beyond the information content.
So, I am not just saying Alice is before Bob, I am also saying e.g. Alice is before Alice+Bob, and I can’t disentangle these statements because Alice+Bob=Bob.
I am not sure what to say about the second example. I am somewhat rejecting the dynamics. “Alice travels back in time” is another way of saying that the high level FFS time disagrees with the standard physical time, which is true. The “high level” here is pointing to the fact that we are only looking at the part of Alice’s brain that is about the envelopes, and thus talking about coarser variables than e.g. Alice’s entire brain state in physical time. And if we are in the ontology where we are only looking at the information content, taking a high level version of a variable is the kind of thing that can change its temporal properties, since you get an entirely new variable.
I suspect most of the disagreement is in the sort of “variable nonrealism” of reducing the physical thing that is Alice’s knowledge to its information content?
Not sure we disagree, maybe I’m just confused. In the post you show that if X is orthogonal to X XOR Y, then X is before Y, so you can “infer a temporal relationship” that Pearl can’t. I’m trying to understand the meaning of the thing you’re inferring—“X is before Y”. In my example above, Bob tells Alice a lossy function of his knowledge, and Alice ends up with knowledge that is “before” Bob’s. So in this case the “before” relationship doesn’t agree with time, causality, or what can be computed from what. But then what conclusions can a scientist make from an inferred “before” relationship?
I don’t have a great answer, which isn’t a great sign.
I think the scientist can infer things like. “algorithms reasoning about the situation are more likely to know X but not Y than they are to know Y but not X, because reasonable processes for learning Y tend to learn learn enough information to determine X, but then forget some of that information.” But why should I think of that as time?
I think the scientist can infer things like “If I were able to factor the world into variables, and draw a DAG (without determinism) that is consistent with the distribution with no spurious independencies (including in deterministic functions of the variables), and X and Y happen to be variables in that DAG, then there will be a path from X to Y.”
The scientist can infer that if Z is orthogonal to Y, then Z is also orthogonal to X, where this is important because Z is orthogonal to Y can be thought of as saying that Z is useless for learning about Y. (and importantly a version of useless for learning that is closed under common refinement, so if you collect a bunch of different Z orthogonal to Y, you can safely combine them, and the combination will be orthogonal to Y.)
This doesn’t seem to get at why we want to call it before. Hmm.
Maybe I should just list a bunch of reasons why it feels like time to me (in no particular order):
It seems like it gets a very reasonable answer in the Game of Life example
Prior to this theory, I thought that it made sense to think of time as a closure property on orthogonality, and this definition of time is exactly that closure property on orthogonality, where X is weakly before Y if whenever Z is orthogonal to Y, Z is also orthogonal to X. (where the definition of orthogonality is justified with the fundamental theorem.)
If Y is a refinement of X, then Y cannot be strictly before X. (I notice that I don’t have a thing to say about why this feels like time to me, and indeed it feels like it is in direct opposition to your “doesn’t agree with what can be computed from what,” but it does agree with the way I feel like I want to intuitively describe time in the stories told in the “Saving Time” post.) (I guess one thing I can say is that as an agent learns over time, we think of the agent as collecting information, so later=more information makes sense.)
History looks a lot like a non-quantitative version of entropy, where instead of thinking of how much randomness goes into a variable, we think of which randomness goes into the variable. There are lemmas towards proving the semigraphoid axioms which look like theorems about entropy modified to replace sums/expectations with unions. Then, “after” exactly corresponds to “greater entropy” in this analogy.
If I imagine X and Z being computed independently, and Y as being computed from X and Z, it will say that X is before Y, which feels right to me (and indeed this property is basically the definition. It seems like my time is maybe the unique thing that gets the right answer on this simple story and also treats variables with the same info content as the same.)
We can convert a Pearlian DAG to a FFS, and under this conversion, d-seperation is sent to conditional orthogonality, and paths between nodes are sent to time. (on the questions Pearl knows how to ask. We also generalize the definition to all variables)
Thanks for the response! Part of my confusion went away, but some still remains.
In the game of life example, couldn’t there be another factorization where a later step is “before” an earlier one? (Because the game is non-reversible and later steps contain less and less information.) And if we replace it with a reversible game, don’t we run into the problem that the final state is just as good a factorization as the initial?
I think your argument about entropy might have the same problem. Since classical physics is reversible, if we build something like a heat engine in your model, all randomness will be already contained in the initial state. Total “entropy” will stay constant, instead of growing as it’s supposed to, and the final state will be just as good a factorization as the initial. Usually in physics you get time (and I suspect also causality) by pointing to a low probability macrostate and saying “this is the start”, but your model doesn’t talk about macrostates yet, so I’m not sure how much it can capture time or causality.
That said, I like really like how your model talks only about information, without postulating any magical arrows. Maybe it has a natural way to recover macrostates, and from them, time?
Wait, I misunderstood, I was just thinking about the game of life combinatorially, and I think you were thinking about temporal inference from statistics. The reversible cellular automaton story is a lot nicer than you’d think.
if you take a general reversible cellular automaton (critters for concreteness), and have a distribution over computations in general position in which initial conditions cells are independent, the cells may not be independent at future time steps.
If all of the initial probabilities are 1⁄2, you will stay in the uniform distribution, but if the probabilities are in general position, things will change, and time 0 will be special because of the independence between cells.
There will be other events at later times that will be independent, but those later time events will just represent “what was the state at time 0.”
For a concrete example consider the reversible cellular automaton that just has 2 cells, X and Y, and each time step it keeps X constant and replaces Y with X xor Y.
Wait, can you describe the temporal inference in more detail? Maybe that’s where I’m confused. I’m imagining something like this:
Check which variables look uncorrelated
Assume they are orthogonal
From that orthogonality database, prove “before” relationships
Which runs into the problem that if you let a thermodynamical system run for a long time, it becomes a “soup” where nothing is obviously correlated to anything else. Basically the final state would say “hey, I contain a whole lot of orthogonal variables!” and that would stop you from proving any reasonable “before” relationships. What am I missing?
I think that you are pointing out that you might get a bunch of false positives in your step 1 after you let a thermodynamical system run for a long time, but they are are only approximate false positives.
I think my model has macro states. In game of life, if you take the entire grid at time t, that will have full history regardless of t. It is only when you look at the macro states (individual cells) that my time increases with game of life time.
As for entropy, here is a cute observation (with unclear connection to my framework): whenever you take two independent coin flips (with probabilities not 0,1, or 1⁄2), their xor will always be high entropy than either of the individual coin flips.
Agree it’s not totally right to call this a causal relationship.
That said:
The contents of 3 envelopes does seems causally upstream of the contents of 10 envelopes
If Alice’s perception is imperfect (in any possible world), then “what Alice perceived” is not identical to “the contents of 3 envelopes” and so is not strictly before “what Bob perceived” (unless there is some other relationship between them).
If Alice’s perception is perfect in every possible world, then there is no possible way to intervene on Alice’s perception without intervening on the contents of the 3 envelopes. So it seems like a lot rests on whether you are restricting your attention to possible worlds.
Even if Alice’s perception is perfect (or if Bob is guaranteed to tell Alice the contents of the 3 envelopes) we can still imagine an intervention on Alice’s perception, and in your stories it seems like that’s what makes it feel like Alice’s perception isn’t upstream of Bob’s perception. But it feels to me like this imagination ought to track subjective possibility, even if in fact it is probably logically necessary that Alice perceives correctly / Bob reports correctly / whatever.
So I do feel like there’s a case to be made that it captures everything we should care about with respect to causality.
For example, it seems unlikely to me that decision theory should depend on what happens in obviously impossible worlds. If we want to depend on impossible worlds it seems like it will usually happen by introducing a more naive epistemic state from which those worlds are subjectively possible—in which case we can talk about the FFS definition with respect to that epistemic state.
(I have no idea if this perspective is endorsed by Scott or if it would stand up to scrutiny.)
I think I (at least locally) endorse this view, and I think it is also a good pointer to what seems to me to be the largest crux between the my theory of time and Pearl’s theory of time.
I feel that interpreting “strictly before” as causality is making me more confused.
For example, here’s a scenario with a randomly changed message. Bob peeks at ten regular envelopes and a special envelope that gives him a random boolean. Then Bob tells Alice the contents of either the first three envelopes or the second three, depending on the boolean. Now Alice’s knowledge depends on six out of ten regular envelopes and the special one, so it’s still “strictly before” Bob’s knowledge. And since Alice’s knowledge can be computed from Bob’s knowledge but not vice versa, in FFS terms that means the “cause” can be (and in fact is) computed from the “effect”, but not vice versa. My causal intuition is just blinking at all this.
Here’s another scenario. Alice gets three regular envelopes and accurately reports their contents to Bob, and a special envelope that she keeps to herself. Then Bob peeks at seven more envelopes. Now Alice’s knowledge isn’t “before” Bob’s, but if later Alice predictably forgets the contents of her special envelope, her knowledge becomes “before” Bob’s. Even though the special envelope had no effect on the information Alice gave to Bob, didn’t affect the causal arrow in any possible world. And if we insist that FFS=causality, then by forgetting the envelope, Alice travels back in time to become the cause of Bob’s knowledge in the past. That’s pretty exotic.
I partially agree, which is partially why I am saying time rather than causality.
I still feel like there is an ontological disagreement in that it feels like you are objecting to saying the physical thing that is Alice’s knowledge is (not) before the physical thing that is Bob’s knowledge.
In my ontology:
1) the information content of Alice’s knowledge is before the information content of Bob’s knowledge. (I am curios if this part is controversial.)
and then,
2) there is in some sense no more to say about the physical thing that is e.g. Alice’s knowledge beyond the information content.
So, I am not just saying Alice is before Bob, I am also saying e.g. Alice is before Alice+Bob, and I can’t disentangle these statements because Alice+Bob=Bob.
I am not sure what to say about the second example. I am somewhat rejecting the dynamics. “Alice travels back in time” is another way of saying that the high level FFS time disagrees with the standard physical time, which is true. The “high level” here is pointing to the fact that we are only looking at the part of Alice’s brain that is about the envelopes, and thus talking about coarser variables than e.g. Alice’s entire brain state in physical time. And if we are in the ontology where we are only looking at the information content, taking a high level version of a variable is the kind of thing that can change its temporal properties, since you get an entirely new variable.
I suspect most of the disagreement is in the sort of “variable nonrealism” of reducing the physical thing that is Alice’s knowledge to its information content?
Not sure we disagree, maybe I’m just confused. In the post you show that if X is orthogonal to X XOR Y, then X is before Y, so you can “infer a temporal relationship” that Pearl can’t. I’m trying to understand the meaning of the thing you’re inferring—“X is before Y”. In my example above, Bob tells Alice a lossy function of his knowledge, and Alice ends up with knowledge that is “before” Bob’s. So in this case the “before” relationship doesn’t agree with time, causality, or what can be computed from what. But then what conclusions can a scientist make from an inferred “before” relationship?
I don’t have a great answer, which isn’t a great sign.
I think the scientist can infer things like. “algorithms reasoning about the situation are more likely to know X but not Y than they are to know Y but not X, because reasonable processes for learning Y tend to learn learn enough information to determine X, but then forget some of that information.” But why should I think of that as time?
I think the scientist can infer things like “If I were able to factor the world into variables, and draw a DAG (without determinism) that is consistent with the distribution with no spurious independencies (including in deterministic functions of the variables), and X and Y happen to be variables in that DAG, then there will be a path from X to Y.”
The scientist can infer that if Z is orthogonal to Y, then Z is also orthogonal to X, where this is important because Z is orthogonal to Y can be thought of as saying that Z is useless for learning about Y. (and importantly a version of useless for learning that is closed under common refinement, so if you collect a bunch of different Z orthogonal to Y, you can safely combine them, and the combination will be orthogonal to Y.)
This doesn’t seem to get at why we want to call it before. Hmm.
Maybe I should just list a bunch of reasons why it feels like time to me (in no particular order):
It seems like it gets a very reasonable answer in the Game of Life example
Prior to this theory, I thought that it made sense to think of time as a closure property on orthogonality, and this definition of time is exactly that closure property on orthogonality, where X is weakly before Y if whenever Z is orthogonal to Y, Z is also orthogonal to X. (where the definition of orthogonality is justified with the fundamental theorem.)
If Y is a refinement of X, then Y cannot be strictly before X. (I notice that I don’t have a thing to say about why this feels like time to me, and indeed it feels like it is in direct opposition to your “doesn’t agree with what can be computed from what,” but it does agree with the way I feel like I want to intuitively describe time in the stories told in the “Saving Time” post.) (I guess one thing I can say is that as an agent learns over time, we think of the agent as collecting information, so later=more information makes sense.)
History looks a lot like a non-quantitative version of entropy, where instead of thinking of how much randomness goes into a variable, we think of which randomness goes into the variable. There are lemmas towards proving the semigraphoid axioms which look like theorems about entropy modified to replace sums/expectations with unions. Then, “after” exactly corresponds to “greater entropy” in this analogy.
If I imagine X and Z being computed independently, and Y as being computed from X and Z, it will say that X is before Y, which feels right to me (and indeed this property is basically the definition. It seems like my time is maybe the unique thing that gets the right answer on this simple story and also treats variables with the same info content as the same.)
We can convert a Pearlian DAG to a FFS, and under this conversion, d-seperation is sent to conditional orthogonality, and paths between nodes are sent to time. (on the questions Pearl knows how to ask. We also generalize the definition to all variables)
Thanks for the response! Part of my confusion went away, but some still remains.
In the game of life example, couldn’t there be another factorization where a later step is “before” an earlier one? (Because the game is non-reversible and later steps contain less and less information.) And if we replace it with a reversible game, don’t we run into the problem that the final state is just as good a factorization as the initial?
Yep, there is an obnoxious number of factorizations of a large game of life computation, and they all give different definitions of “before.”
I think your argument about entropy might have the same problem. Since classical physics is reversible, if we build something like a heat engine in your model, all randomness will be already contained in the initial state. Total “entropy” will stay constant, instead of growing as it’s supposed to, and the final state will be just as good a factorization as the initial. Usually in physics you get time (and I suspect also causality) by pointing to a low probability macrostate and saying “this is the start”, but your model doesn’t talk about macrostates yet, so I’m not sure how much it can capture time or causality.
That said, I like really like how your model talks only about information, without postulating any magical arrows. Maybe it has a natural way to recover macrostates, and from them, time?
Wait, I misunderstood, I was just thinking about the game of life combinatorially, and I think you were thinking about temporal inference from statistics. The reversible cellular automaton story is a lot nicer than you’d think.
if you take a general reversible cellular automaton (critters for concreteness), and have a distribution over computations in general position in which initial conditions cells are independent, the cells may not be independent at future time steps.
If all of the initial probabilities are 1⁄2, you will stay in the uniform distribution, but if the probabilities are in general position, things will change, and time 0 will be special because of the independence between cells.
There will be other events at later times that will be independent, but those later time events will just represent “what was the state at time 0.”
For a concrete example consider the reversible cellular automaton that just has 2 cells, X and Y, and each time step it keeps X constant and replaces Y with X xor Y.
Wait, can you describe the temporal inference in more detail? Maybe that’s where I’m confused. I’m imagining something like this:
Check which variables look uncorrelated
Assume they are orthogonal
From that orthogonality database, prove “before” relationships
Which runs into the problem that if you let a thermodynamical system run for a long time, it becomes a “soup” where nothing is obviously correlated to anything else. Basically the final state would say “hey, I contain a whole lot of orthogonal variables!” and that would stop you from proving any reasonable “before” relationships. What am I missing?
I think that you are pointing out that you might get a bunch of false positives in your step 1 after you let a thermodynamical system run for a long time, but they are are only approximate false positives.
I think my model has macro states. In game of life, if you take the entire grid at time t, that will have full history regardless of t. It is only when you look at the macro states (individual cells) that my time increases with game of life time.
As for entropy, here is a cute observation (with unclear connection to my framework): whenever you take two independent coin flips (with probabilities not 0,1, or 1⁄2), their xor will always be high entropy than either of the individual coin flips.