Suppose I notice I am a human on Earth in America. I consider two hypotheses. One is that everything is as it seems. The other is that there is a vast conspiracy to hide the fact that America is much bigger than I think—it actually contains one trillion trillion people. It seems like SIA should prefer the conspiracy theory (if the conspiracy is too implausible, just increase the posited number of people until it cancels out).
I am often confused by the kind of reasoning at play in the text I bolded. Maybe someone can help sort me out. As I increase the number of people in the conspiracy world, my prior in that world also decreases. If my prior falls faster than the number of people in the considered world grows, I will not be able to construct a conspiracy-world that allows the thought experiment to bite.
Consider the situation where I arrive at the airport, where I will wait in line at security. Wouldn’t I be more likely to discover a line 1000 people long than 100 people long? I am 10x more likely to exist in the longer line. The problem is that our prior on 1000 people security lines might be very low. The reasoning on display in the above passage would invite us to simply crank up the length of the line, say, to 1 million people. I suspect that SIA proponents don’t show up at the airport expecting lines this long. Why? Because the prior on a million-person line is more than a thousand times lower than the prior on a 100-person line.
This also applies to some presentations of Pascal’s mugging.
There’s no principle that says that prior probability of a population exceeding some size N must decrease more quickly than 1/N asymptotically, or any other property of some system. Some priors will have this property, some won’t.
My prior for real-world security lines does have this property, though this cheats a little by being largely founded in real-world experience already. Does my prior for population of hypothetical worlds involving Truman Show style conspiracies (or worse!) have this property? I don’t know—maybe not?
Does it even make sense to have a prior over these? After all a prior still requires some sort of model that you can use to expect things or not, and I have no reasonable models at all for such worlds. A mathematical “universal” prior like Solomonoff is useless since it’s theoretically uncomputable, and also in a more practical sense utterly disconnected from the domain of properties such as “America’s population”.
On the whole though, your point is quite correct that for many priors you can’t “integrate the extreme tails” to get a significant effect. The tails of some priors are just too thin.
While you’re quite right about numbers on the scale of billions or trillions, I don’t think it makes sense in the limit for the prior probability of X people existing in the world to fall faster than X grows in size.
Certain series of large numbers grow larger much faster than they grow in complexity. A program that returns 10^(10^(10^10)) takes fewer bits to specify (relative to most reasonable systems of specifying programs) than a program that returns 32758932523657923658936180532035892630581608956901628906849561908236520958326051861018956109328631298061259863298326379326013327851098368965026592086190862390125670192358031278018273063587236832763053870032004364702101004310417647840155719238569120561329853619283561298215693286953190539832693826325980569123856910536312892639082369382562039635910965389032698312569023865938615338298392306583192365981036198536932862390326919328369856390218365991836501590931685390659103658916392090356835906398269120625190856983206532903618936398561980569325698312650389253839527983752938579283589237325987329382571092301928* - even though 10^(10^(10^10)) is by far the larger number. And it only takes a linear increase in complexity to make it 10^(10^(10^(10^(10^(10^10))))) instead.
*I produced this number via keyboard-mashing; it’s not anything special.
Consider the proposition “A superpowered entity capable of creating unlimited numbers of people ran a program that output the result of a random program out of all possible programs (with their outputs rendered as integers), weighted by the complexity of those programs, and then created that many people.”
If this happened, the probability that their program outputs at least X would fall much slower than X rises, in the limit. The sum doesn’t converge at all; the expected number of people created would be literally infinite.
So as long as you assign greater than literally zero probability to that proposition—and there’s no such thing as zero probability—there must exist some number X such that you assign greater than 1/X probability to X people existing. In fact, there must exist some number X such that you assign greater than 1/X probability to X million people existing, or X billion, or so on.
(btw, I don’t think that the sort of SIA-based reasoning here is actually valid—but if it was, then yeah, it implies that there are infinite people.)
I think when you get to any class of hypotheses like “capable of creating unlimited numbers of people” with nonzero probability, you run into multiple paradoxes of infinity.
For example, there is no uniform distribution over any countable set, which includes the set of all halting programs. Every non-uniform distribution this hypothetical superbeing may have used over such programs is a different prior hypothesis. The set of these has no suitable uniform distribution either, since they can be partitioned into countably many equivalence classes under natural transformations.
It doesn’t take much study of this before you’re digging into pathologies of measure theory such as Vitali sets and similar.
You can of course arbitrarily pick any of these weightings to be your “chosen” prior, but that’s just equivalent to choosing a prior over population directly so it doesn’t help at all.
Probability theory can’t adequately deal with such hypothesis families, and so if you’re considering Bayesian reasoning you must discard them from your prior distribution. Perhaps there is some extension or replacement for probability that can handle them, but we don’t have one.
Scott Alexander says:
I am often confused by the kind of reasoning at play in the text I bolded. Maybe someone can help sort me out. As I increase the number of people in the conspiracy world, my prior in that world also decreases. If my prior falls faster than the number of people in the considered world grows, I will not be able to construct a conspiracy-world that allows the thought experiment to bite.
Consider the situation where I arrive at the airport, where I will wait in line at security. Wouldn’t I be more likely to discover a line 1000 people long than 100 people long? I am 10x more likely to exist in the longer line. The problem is that our prior on 1000 people security lines might be very low. The reasoning on display in the above passage would invite us to simply crank up the length of the line, say, to 1 million people. I suspect that SIA proponents don’t show up at the airport expecting lines this long. Why? Because the prior on a million-person line is more than a thousand times lower than the prior on a 100-person line.
This also applies to some presentations of Pascal’s mugging.
This point was recently elaborated on here: Pascal’s Mugging and the Order of Quantification
There’s no principle that says that prior probability of a population exceeding some size N must decrease more quickly than 1/N asymptotically, or any other property of some system. Some priors will have this property, some won’t.
My prior for real-world security lines does have this property, though this cheats a little by being largely founded in real-world experience already. Does my prior for population of hypothetical worlds involving Truman Show style conspiracies (or worse!) have this property? I don’t know—maybe not?
Does it even make sense to have a prior over these? After all a prior still requires some sort of model that you can use to expect things or not, and I have no reasonable models at all for such worlds. A mathematical “universal” prior like Solomonoff is useless since it’s theoretically uncomputable, and also in a more practical sense utterly disconnected from the domain of properties such as “America’s population”.
On the whole though, your point is quite correct that for many priors you can’t “integrate the extreme tails” to get a significant effect. The tails of some priors are just too thin.
While you’re quite right about numbers on the scale of billions or trillions, I don’t think it makes sense in the limit for the prior probability of X people existing in the world to fall faster than X grows in size.
Certain series of large numbers grow larger much faster than they grow in complexity. A program that returns 10^(10^(10^10)) takes fewer bits to specify (relative to most reasonable systems of specifying programs) than a program that returns 32758932523657923658936180532035892630581608956901628906849561908236520958326051861018956109328631298061259863298326379326013327851098368965026592086190862390125670192358031278018273063587236832763053870032004364702101004310417647840155719238569120561329853619283561298215693286953190539832693826325980569123856910536312892639082369382562039635910965389032698312569023865938615338298392306583192365981036198536932862390326919328369856390218365991836501590931685390659103658916392090356835906398269120625190856983206532903618936398561980569325698312650389253839527983752938579283589237325987329382571092301928* - even though 10^(10^(10^10)) is by far the larger number. And it only takes a linear increase in complexity to make it 10^(10^(10^(10^(10^(10^10))))) instead.
*I produced this number via keyboard-mashing; it’s not anything special.
Consider the proposition “A superpowered entity capable of creating unlimited numbers of people ran a program that output the result of a random program out of all possible programs (with their outputs rendered as integers), weighted by the complexity of those programs, and then created that many people.”
If this happened, the probability that their program outputs at least X would fall much slower than X rises, in the limit. The sum doesn’t converge at all; the expected number of people created would be literally infinite.
So as long as you assign greater than literally zero probability to that proposition—and there’s no such thing as zero probability—there must exist some number X such that you assign greater than 1/X probability to X people existing. In fact, there must exist some number X such that you assign greater than 1/X probability to X million people existing, or X billion, or so on.
(btw, I don’t think that the sort of SIA-based reasoning here is actually valid—but if it was, then yeah, it implies that there are infinite people.)
I think when you get to any class of hypotheses like “capable of creating unlimited numbers of people” with nonzero probability, you run into multiple paradoxes of infinity.
For example, there is no uniform distribution over any countable set, which includes the set of all halting programs. Every non-uniform distribution this hypothetical superbeing may have used over such programs is a different prior hypothesis. The set of these has no suitable uniform distribution either, since they can be partitioned into countably many equivalence classes under natural transformations.
It doesn’t take much study of this before you’re digging into pathologies of measure theory such as Vitali sets and similar.
You can of course arbitrarily pick any of these weightings to be your “chosen” prior, but that’s just equivalent to choosing a prior over population directly so it doesn’t help at all.
Probability theory can’t adequately deal with such hypothesis families, and so if you’re considering Bayesian reasoning you must discard them from your prior distribution. Perhaps there is some extension or replacement for probability that can handle them, but we don’t have one.