Abstract: If we assume that any universe can be modeled as a computer program which has been running for finitely many steps, then we can assign a multiverse-address to every event by combining its world-program with the number of steps into the world-program where it occurs. We define a probability distribution over multiverse-addresses called a Finite Occamian Multiverse (FOM). FOMs assign negligible probability mass to being a Boltzmann brain or to being in a universes that implements the Many Worlds Interpretation of quantum mechanics.
One explanation of existence is the Tegmark level 4 multiverse, the idea that all coherent mathematical structures exist, and our universe is one of them. To make this meaningful, we must add a probability distribution over mathematical structures, effectively assigning each a degree of existence. Assume that the universe we live in can be fully modeled as a computer program, and that that program, and the number of steps it’s been running for, are both finite. (Note that it’s not clear whether our universe is finite or infinite; our universe is either spatially infinite, or expanding outwards at a rate greater than or equal to the speed of light, but there’s no observation we could make inside the universe that would distinguish these two possibilities.) Call the program that implements our universe a world-program, W. This could be implemented in any programming language—it doesn’t really matter which, since we can translate between languages by prepending some stuff to translate.
Now, suppose we choose a particular event in the universe—an atom emitting a photon, say—and we want to find a corresponding operation in the world-program. We could, in principle, run W until it starts working on the part of spacetime we care about, and count the steps. Call the number of steps leading up to this event T. Taken together, the pair (W,T) uniquely identifies a place, not just in the universe, but in the space of all possible universes. Call any such pair (W,T) a multiverse-address.
Now, suppose we observe an event. What should be our prior probability distribution over multiverse-addresses for that event? That is, for a given event (W,T), what is P(W=X and T=Y)?
For this question, our best (and pretty much only possible) tool is Occam’s Razor. We’re after a prior probability distribution, so we aren’t going to bother including all the things we know about W from observation, except that we have good reason to believe that W is short—what we know of physics seems to indicate that at the most basic level, the rules are simple. So, first factor out W and apply Occam’s Razor to it:
P(W=X and T=Y) = P(W=X) * P(T=Y|W=X) P(W=X and T=Y) = exp(-len(W)) * P(T=Y|W=X)
Now assume independence between T and W. This isn’t entirely correct (some world-programs are structured in such a way that all the events we might be looking for happen on even time-steps, for example), but that kind of entanglement isn’t important for our purposes. Then apply Occam’s Razor to T, getting
P(W=X and T=Y) = exp(-len(W)-len(T))
Now, applying Occam’s razor to T requires some explanation, and there is one detail we have glossed over; we referred to the length of W and T (that is, their logarithm), when we should have referred to their Kolmogorov complexity—that is, their length after compression. For example, a world-program that contains 10^10 random instructions is much less likely than one that contains 10^10 copies of the same instruction. Suppose we resolve this by requiring W to be fully compressed, and give it an initialization stage where it unpacks itself before we start counting steps for T.
This lets us transfer bits of complexity from T to W, by having W run itself for awhile during the initialization stage. We can also transfer complexity from W to T, by writing W in such a way that it runs a class of programs in order, and T determines which of them it’s running. Since we can transfer complexity back and forth between W and T, we can’t justify applying Occam’s Razor to one but not the other, so it makes sense to apply it to T. This also means that we should also treat T as compressible; it is more likely that the universe is 3^^^3 steps old than that is 207798236098322674 steps old.
To recap—we started by assuming that the universe is a computer program, W. We chose an event in W, corresponding to a computation that occurs after W has preformed T operations. We assume that W and T are both finite. Occam’s Razor tells us that W, if fully compressed, should be short. We can trade off complexity between W and T, so we should also apply Occam’s Razor to T and expect that T, if fully compressed, should also be short. We had to assume that the universe behaves like a computer program, that that program is finite, and that the probability distribution which Occam’s Razor gives us is actually meaningful here.
We then got P(W=X and T=Y) = exp(-len(W)-len(T)). Call this probability distribution a Finite Occamian Multiverse (FOM). We can define this in terms of different programming languages, reweighting the probabilities of different universe-addresses somewhat, but all FOMs share some interesting properties.
A Finite Occamian Multiverse avoids the Boltzmann brain problem. A Boltzmann brain is a brain that, rather than living in a simulation with stable physics that allow it to continue to exist as the simulation advances, arises by chance out of randomly arranged particles or other simulation-components, and merely thinks (contains a representation of the claim that) it lives in a universe with stable physics. If you live in a FOM, then the probability that you are a Boltzmann brain is negligible because Boltzmann brains must have extremely complex multiverse-addresses, while evolved brains can have multiverse-addresses that are simple.
If we are in a Finite Occamian Multiverse, then the Many Worlds interpretation of quantum mechanics must be false, because if it were true, then any multiverse address would have to contain the complete branching history of the universe, so its length would be proportional to the mass of the universe times the age of the universe. On the other hand, if branches were selected according to a pseudo-random process, then multiverse-addresses would be short. This sort of pseudo-random process would slightly increase the length of W, but drastically decrease the length of T. In other words, in this type of multiverse, worldeaters eat more complexity than they contain.
If we are in a Finite Occamian Multiverse, then we might also expect certain quantities, such as the age and volume of the universe, to have much less entropy than otherwise expected. If, for example, we discovered that the universe had been running for exactly 3^^^3+725 time steps, then we could be reasonably certain that we were inside such a multiverse.
This kind of multiverse also sets an upper bound on the total amount of entropy (number of fully independent random bits) that can be gathered in one place, equal to the total complexity of that place’s multiverse-address, since it would be possible to generate all of those bits from the multiverse-address by simulating the universe. However, since simulating the universe is intractible, the universe can still act as a very-strong cryptographic pseudorandom number generator.
Addresses in the Multiverse
Abstract: If we assume that any universe can be modeled as a computer program which has been running for finitely many steps, then we can assign a multiverse-address to every event by combining its world-program with the number of steps into the world-program where it occurs. We define a probability distribution over multiverse-addresses called a Finite Occamian Multiverse (FOM). FOMs assign negligible probability mass to being a Boltzmann brain or to being in a universes that implements the Many Worlds Interpretation of quantum mechanics.
One explanation of existence is the Tegmark level 4 multiverse, the idea that all coherent mathematical structures exist, and our universe is one of them. To make this meaningful, we must add a probability distribution over mathematical structures, effectively assigning each a degree of existence. Assume that the universe we live in can be fully modeled as a computer program, and that that program, and the number of steps it’s been running for, are both finite. (Note that it’s not clear whether our universe is finite or infinite; our universe is either spatially infinite, or expanding outwards at a rate greater than or equal to the speed of light, but there’s no observation we could make inside the universe that would distinguish these two possibilities.) Call the program that implements our universe a world-program, W. This could be implemented in any programming language—it doesn’t really matter which, since we can translate between languages by prepending some stuff to translate.
Now, suppose we choose a particular event in the universe—an atom emitting a photon, say—and we want to find a corresponding operation in the world-program. We could, in principle, run W until it starts working on the part of spacetime we care about, and count the steps. Call the number of steps leading up to this event T. Taken together, the pair (W,T) uniquely identifies a place, not just in the universe, but in the space of all possible universes. Call any such pair (W,T) a multiverse-address.
Now, suppose we observe an event. What should be our prior probability distribution over multiverse-addresses for that event? That is, for a given event (W,T), what is P(W=X and T=Y)?
For this question, our best (and pretty much only possible) tool is Occam’s Razor. We’re after a prior probability distribution, so we aren’t going to bother including all the things we know about W from observation, except that we have good reason to believe that W is short—what we know of physics seems to indicate that at the most basic level, the rules are simple. So, first factor out W and apply Occam’s Razor to it:
P(W=X and T=Y) = P(W=X) * P(T=Y|W=X)
P(W=X and T=Y) = exp(-len(W)) * P(T=Y|W=X)
Now assume independence between T and W. This isn’t entirely correct (some world-programs are structured in such a way that all the events we might be looking for happen on even time-steps, for example), but that kind of entanglement isn’t important for our purposes. Then apply Occam’s Razor to T, getting
P(W=X and T=Y) = exp(-len(W)-len(T))
Now, applying Occam’s razor to T requires some explanation, and there is one detail we have glossed over; we referred to the length of W and T (that is, their logarithm), when we should have referred to their Kolmogorov complexity—that is, their length after compression. For example, a world-program that contains 10^10 random instructions is much less likely than one that contains 10^10 copies of the same instruction. Suppose we resolve this by requiring W to be fully compressed, and give it an initialization stage where it unpacks itself before we start counting steps for T.
This lets us transfer bits of complexity from T to W, by having W run itself for awhile during the initialization stage. We can also transfer complexity from W to T, by writing W in such a way that it runs a class of programs in order, and T determines which of them it’s running. Since we can transfer complexity back and forth between W and T, we can’t justify applying Occam’s Razor to one but not the other, so it makes sense to apply it to T. This also means that we should also treat T as compressible; it is more likely that the universe is 3^^^3 steps old than that is 207798236098322674 steps old.
To recap—we started by assuming that the universe is a computer program, W. We chose an event in W, corresponding to a computation that occurs after W has preformed T operations. We assume that W and T are both finite. Occam’s Razor tells us that W, if fully compressed, should be short. We can trade off complexity between W and T, so we should also apply Occam’s Razor to T and expect that T, if fully compressed, should also be short. We had to assume that the universe behaves like a computer program, that that program is finite, and that the probability distribution which Occam’s Razor gives us is actually meaningful here.
We then got P(W=X and T=Y) = exp(-len(W)-len(T)). Call this probability distribution a Finite Occamian Multiverse (FOM). We can define this in terms of different programming languages, reweighting the probabilities of different universe-addresses somewhat, but all FOMs share some interesting properties.
A Finite Occamian Multiverse avoids the Boltzmann brain problem. A Boltzmann brain is a brain that, rather than living in a simulation with stable physics that allow it to continue to exist as the simulation advances, arises by chance out of randomly arranged particles or other simulation-components, and merely thinks (contains a representation of the claim that) it lives in a universe with stable physics. If you live in a FOM, then the probability that you are a Boltzmann brain is negligible because Boltzmann brains must have extremely complex multiverse-addresses, while evolved brains can have multiverse-addresses that are simple.
If we are in a Finite Occamian Multiverse, then the Many Worlds interpretation of quantum mechanics must be false, because if it were true, then any multiverse address would have to contain the complete branching history of the universe, so its length would be proportional to the mass of the universe times the age of the universe. On the other hand, if branches were selected according to a pseudo-random process, then multiverse-addresses would be short. This sort of pseudo-random process would slightly increase the length of W, but drastically decrease the length of T. In other words, in this type of multiverse, worldeaters eat more complexity than they contain.
If we are in a Finite Occamian Multiverse, then we might also expect certain quantities, such as the age and volume of the universe, to have much less entropy than otherwise expected. If, for example, we discovered that the universe had been running for exactly 3^^^3+725 time steps, then we could be reasonably certain that we were inside such a multiverse.
This kind of multiverse also sets an upper bound on the total amount of entropy (number of fully independent random bits) that can be gathered in one place, equal to the total complexity of that place’s multiverse-address, since it would be possible to generate all of those bits from the multiverse-address by simulating the universe. However, since simulating the universe is intractible, the universe can still act as a very-strong cryptographic pseudorandom number generator.