Some Reflections on Branching vs Probability

The “Coin Universe”

Imagine a universe split into subsystems A and B, where causal influences go from A to B but not vice versa. A is extremely simple—each day either a “heads event” or “tails event” takes place, which is visible to observers in B as a red or green patch at a certain place in the sky. In fact, coin events are the only ‘communication’ between A and B.

Naturally, the observers in B would like to understand the pattern of coin events so they formulate some theories.

Two Rival Theories

Theory 1: Every day, the universe ‘splits’ into two. The entire contents of B are ‘copied’ somehow. One copy sees a heads event, the other copy sees a tails event.

Theory 2: There is no splitting. Instead, some kind of (deterministic or stochastic) process in A is producing the sequence of events. As a special case (Theory 2a) it could be that each coin event is independent and random, and has a probability ¹⁄₂ of being heads. (Let’s also write down another special case (Theory 2b) where each coin event has probability ⁹⁄₁₀ of being heads.)

The Position of Jupiter

Imagine that we’re primitive astronomers trying to understand the trajectory of Jupiter through the sky. We take for granted that our observations are indexed by a variable called “time” and that a complete theory of Jupiter’s trajectory would have the form: “Position of Jupiter at time t = F(t)” for some function F that we can calculate. Suppose such a theory is formulated.

However, if we believe in a point P such that P is “the position of Jupiter” then the theory does not resolve all of our uncertainty about P, for it merely tells us: “If you ask the question at time t then the answer is F(t).” If the question is asked “timelessly” then there is no unique answer. There isn’t even a probability distribution over the set of possible answers, because there is no ‘probability distribution over time’.¹

Theories 1 and 2 are Incommensurable

Can the scientists in the Coin Universe decide empirically between Theories 1 and 2a? Imagine that our scientists already have a ‘theory of everything’ for B’s own physics, and that coin events are the only phenomena about which there remains controversy.

Barbara believes in theory 2a and she thinks that the probability that the next toss is heads is ¹⁄₂. Alfred believes in theory 1 and thinks that the concept of “tomorrow’s coin event” is as meaningless as the concept of “the (timeless) position of Jupiter”. Whereas Barbara indexes time using real numbers, Alfred indexes time by pairs consisting of a real number and a sequence of prior coin events. Barbara cannot use Bayesian reasoning to discriminate 1 and 2a because from her perspective, theory 1 is incomplete—it refuses to make a prediction about “tomorrow’s coin event”. On the other hand, when Alfred tries to put Barbara’s theory into a form that he can test, by taking out the meaningless notion of “tomorrow’s coin event”, he discovers that what’s left is exactly his own theory.

In fact, the same problem arises for every variation of Theory 2 (except those which sometimes predict a probability or 1 or 0). Variations of Theory 2 can be tested against each other, but not against Theory 1.

Why Should ‘Splitting’ Entail That Probabilities Aren’t Well Defined?

If the universe is splitting then a ‘history of the universe’ looks like a branching tree rather than a line. Now, I’m taking for granted an ‘objective’ concept of probability in which the set Ω of possible histories of the universe has the structure of a ‘probability space’, so that the only things which can be assigned (‘objective’) probabilities are subsets of Ω. So for any event E with a well-defined probability, and any possible history H, E must either contain all of H or else none of it. Hence, it makes no sense to look at some branch B within H and ask about “the probability that B is true”. (Any more than we can look at a particular person in the world and ask “what is the probability of being this person?”)

A natural response might be as follows:

Surely if time is a branching tree then all we have to do to define the probabilities of branches is say that the probability of a ‘child node’ is the probability of its parent divided by the number of ‘children’. So we could simulate Theory 2a within a purely deterministic universe by having one ‘heads branch’ and one ‘tails branch’ each day of the simulation, or we could simulate Theory 2b instead by having nine ‘heads branches’ and only one ‘tails’.

Observers in the first simulation would experience ¹⁄₂ probabilities of heads, while observers in the second would experience ⁹⁄₁₀ probabilities.

(Note: We can make this idea of “experiencing 1/2” or “experiencing 9/10″ more vivid by supposing that ‘days’ happen extremely quickly, so that experiencing ¹⁄₂ would mean that the relevant patch of sky is flickering between red and green so fast that it looks yellow, whereas ⁹⁄₁₀ would equate to an orangey-red.)

The Lesson of the Ebborians

Consider the Ebborian universe. It has five dimensions: three ‘ordinary’ spatial dimensions, one of time and an ‘extra’ dimension reserved for ‘splitting’. If you were to draw a cross section of the Ebborian universe along the temporal and the ‘extra’ dimension, you would see a branching tree. Suppose for simplicity that all such cross sections look alike, so that each time the universe ‘splits’ it happens everywhere simultaneously. Now, a critical point in Eliezer’s fable is that the branches have thickness, and the subjective probability of finding yourself in a child branch is supposed to be proportional to the square of that branch’s thickness. For my purposes I want branches to have widths, such that the width of a parent branch equal the sum of widths of its children, but I want to discard the idea of squaring. Imagine that the only times the universe ‘splits’ are once a day when a red or green light appears somewhere in the sky, which the Ebborians call a “heads event” or “tails event” respectively. Hmm, this sounds familiar...

Prima facie it seems that we’ve reconstructed a version of the Coin Universe which (a) contains “splitting” and (b) contains “objective probabilities” (which “clearly” ought to be proportional to the widths of branches).

What I want to ask is: why exactly should ‘width along the extra dimension’ be proportional to ‘probability’? One possible answer would be “that’s just what the extra dimension is. It’s intrinsically a ‘dimension of probability’.” That’s fine, I guess, but then I want to say that the difference between this Coin Universe and one described by Theory 2 is purely verbal. But now suppose the extra dimension is just an ordinary spatial dimension (whatever that means). Then where does the rule ‘probability = thickness’ come from, when there are so many other possibilities? E.g. “At any branch point, the probability of taking the left branch = (9/10) * (width of left branch / width of parent), and probability of right branch = (1/10) * (width of right branch / width of parent).” If this was the rule then even though uniform branch widths may suggest that Theory 2a is correct, the ‘experienced probabilities’ would be those of Theory 2b. (If days were extremely rapid, the sky would look orange-red rather than yellow.)

If the extra dimension is not explicitly a ‘dimension of probability’ then the ‘experienced probabilities’ will be indeterminate without a ‘bridge law’ connecting width and probability. But the difference between (“extra dimension is spatial” + bridge law connecting branch widths and probability) and (“extra dimension is probability” + bridge law connecting probabilities with epiphenomenal ‘branch widths’) is purely verbal.

So ultimately the only two possibilities are (i) the extra dimension is a ‘dimension of probability’, and there is no ‘splitting’; or else (ii) the probabilities are indeterminate.

Of various possible conclusions, one in particular seems worth noting down: If we are attempting to simulate a Coin Universe by computing all of its branches at once, then regardless of how we ‘tag’ or ‘weight’ the branches to indicate their supposed probabilities, we should not think that we are thereby affecting the experiences of the simulated beings. (So ignoring ‘externalities’ there’s no moral imperative that we should prefer two copies of a happy simulation and one of a sad simulation over two ‘sad’ simulations and one ‘happy’, any more than that we should stick pieces of paper to the computer cases saying “probability 9/10” and “probability 1/10″.)

Implications For MWI?

I don’t want to go into too much detail. For what it’s worth, my current way of thinking is that a quantum theory is neither “deterministic” nor “probabilistic” but just “quantum” (It constitutes “Theory 3″). Perhaps MWI is what you get when you (misguidedly) try to conceive of a quantum theory as deterministic? Two things in particular have suggested this to me: (i) Scott Aaronson’s lecture and (ii) this paper which goes some way towards refuting what I had previously taken to be one of the strongest reasons for ‘believing in’ many worlds.

Is Probability Reducible?

It’s conspicuous that the discussion above presupposes that probabilities—“real probabilities”—are or might be ‘built in’ at the ‘ground floor’ of reality. However, others have made ingenious attempts to show how (our concepts and perceptions of) probability can arise perfectly well even if the universe doesn’t presuppose it. I’m not averse to this project—in fact it parallels Dennett’s strategy in the philosophy of mind, namely to show how it can ‘seem like’ we have ‘qualia’ even in a world where no such things exist.

Anyway, I seem to be converging on cousin_it’s statement: “Perhaps counterintuitively, the easiest way for probabilities to arise is not by postulating ‘different worlds’ that you could ‘end up’ in starting from now.”

¹ Perhaps Julian Barbour would disagree. However, for the purposes of my discussion, I’m presupposing the naive ‘common sense’ view of time where ‘the facts’ about a (classical) universe are exhausted precisely when we’ve specified “the state of affairs at every moment of time”. Another possible objection is that because Jupiter’s position in the sky repeats cyclically, we can define ‘time averages’ after all. Well, let’s just suppose that these astronomers are able to detect the slight deviations due to e.g. the solar wind pushing Jupiter away and lengthening its orbit. (If you’re still bothered, imagine replacing ‘position of Jupiter’ with ‘brightness of the Sun’, which is gradually increasing on a geological timescale.)