With many thanks to Damon Binder, and the spirited conversations that lead to this post, and to Anders Sandberg.
People often think that the self-indication assumption (SIA) implies a huge number of alien species, millions of times more than otherwise. Thought experiments like the presumptuous philosopher seem to suggest this.
But here I’ll show that, in many cases, updating on SIA doesn’t change the expected number of alien species much. It all depends on the prior, and there are many reasonable priors for which the SIA update does nothing more than double the probability of life in the universe[1].
This can be the case even if the prior says that life is very unlikely! We can have a situation where we are astounded, flabbergasted, and disbelieving about our own existence—“how could we exist, how can this beeeeee?!?!?!?”—and still not update much—“well, life is still pretty unlikely elsewhere, I suppose”.
In the one situation where we have an empirical distribution, the “Dissolving the Fermi Paradox” paper, the effect of the SIA anthropics update is to multiply the expected civilization per planet by seven. Not seven orders of magnitude—just seven.
The formula
Let ρ∈[0,1] be the probability of advanced space-faring life evolving on a given planet; for the moment, ignore issues of life expanding to other planets from their one point of origin. Let f be the prior distribution of ρ, with mean μ and variance σ2. This means that, if we visit another planet, our probability of finding life is μ.
On this planet, we exist[2]. Then if we update on our existence we get a new distribution f′; this distribution will have mean μ′:
μ′=μ(1+σ2μ2).
To see a proof of this result, look at this footnote[3].
Define Mμ,σ2=1+σ2/μ2 to be this multiplicative factor between μ and μ′; we’ll show that there are many reasonable situations where Mμ,σ2 is surprisingly low: think 2 to 100, rather than in the millions or billions.
Beta distributions I
Let’s start with the most uninformative prior of all: a uniform prior over [0,1]. The expectation of ρ is ∫10ρdρ=1/2, so, without any other information, we expect a planet to have life with 50% probability. The variance is σ2=1/12.
Thus if we update on our existence on Earth, we get the posterior f′(ρ)=2ρ; the mean of this is 2/3 (either direct calculation or using M1/2,1/12=1+4/12=4/3).
Even though this change in expectation is multiplicatively small, it does seem that the uniform prior and the f′(ρ) are very different, with f′(ρ) heavily skewed to the right. But now consider what happens if we look at Mars and notice that it hasn’t got life. The probability of no life, given ρ, is 1−ρ. Updating on this and renormalising gives a posterior 6ρ(1−ρ):
The expectation of 6ρ(1−ρ), symmetric around 1/2, is of course 1/2. Thus one extra observation (that Mars is dead) has undone, in expectation, all the anthropic impact of our own existence.
This is an example of a beta distribution for α=2 and β=2 (yes, beta distributions have a parameter called β and another one that’s α; just deal with it). Indeed, the uniform prior is also a beta distribution (with α=β=1) as is the anthropic updated version 2ρ (which has α=2, β=1).
The update rule for beta distributions is that a positive observation (ie life) increases α by 1, and a negative observation (a dead planet) increases β by 1. The mean of an updated beta distribution is a generalised version of Laplace’s law of succession: if our prior is a beta distribution with parameters α and β, and we’ve had m positive observations and n negative ones, then the mean of the posterior is:
α+mα+β+m+n.
Suppose now that we have observed n dead planets, but no life, and that we haven’t done an anthropic update yet, then we have a probability of life of α/(α+β+n). Upon adding the anthropic update, this shifts to (α+1)/(α+β+n+1), meaning that the multiplicative factor is at most (α+1)/α. If we started with the uniform prior with its α=1, this multiplies the probability of life by at most 2. In a later section, we’ll look at α<1.
High prior probability is not required for weak anthropic update
The uniform prior has α=β=1 and starts at expectation 1/2. But we can set α=1 and a much higher β, which skews the distribution to the left; for example, for β=2, 3, and 10:
Even though these priors are skewed to the left, and have lower prior probabilities of life (1/3, 1/4, and 1/11), the anthropic update has a factor Mμ,σ2 that is less than 2.
Also note that if we scale the prior f by a small ϵ, so replace f(ρ) on the range [0,1] with f(ρ/ϵ)/ϵ on the range [0,ϵ], then μ is multiplied by ϵ and σ2 is multiplied by ϵ2. Thus Mμ,ϵ is unchanged. Here, for example, is the uniform distribution, scaled down by ϵ=1, ϵ=1/3, and ϵ=1/20:
All of these will have the same Mμ,σ2 (which is 4/3, just as for the uniform distribution). And, of course, doing the same scaling with the various beta distributions we’ve seen up until now will also keep Mμ,σ2 constant.
Thus there are a lot of distributions with very low μ (ie very low prior probability of life) but an Mμ,σ2 that’s less than 2 (ie the anthropic update is less than a doubling of the probability of life).
Beta distributions II and log-normals
The best-case scenario for Mμ,σ2 is if f assigns probability 1 to ρ=μ. In that case, σ2=0 and M=1: the anthropic update changes nothing.
Conversely, the worse-case scenario for Mμ,σ2 is if f only allows ρ=0 and ρ=1. In that case, f assigns probability μ to 1 and 1−μ to 0, for a mean of μ and a variance of σ2=μ−μ2, and a multiplicative factor of Mμ,σ2=1/μ. In this case, after anthropic update, f′ assigns certainty to ρ=1 (since any life at all, given this f, means life on all planets).
But there are also more reasonable priors with large Mμ,σ2. We’ve already seen some, implicitly, above: the beta distributions with α<1. In that case, Mμ,σ2 is bounded by (α+1)/α. If α=3/4 and β=1, for instance, this corresponds to the (unbounded) distribution f(ρ)=(3/4)ρ−1/4; the multiplicative factor is below 7/3, which is slightly above 2. But as α declines, the multiplicative factor can go up surprisingly fast; at α=1/2 it is 3, at α=1/4 it is 5:
In general, for α=1/n, the multiplicative factor is bounded by n+1. This gets arbitrarily large as α→0. Though α=0 itself corresponds to the improper priorf(ρ)=1/ρ, whose integral diverges. On a log scale, this corresponds to the log-uniform distribution, which is roughly what you get if you assume “we need N steps, each of probability p, to get life; let’s put a uniform prior over the possible Ns”.
It’s not clear why one might want to choose α=1/1020 for a prior, but there is a class of prior that is much more natural: the log-normal distributions. These are random variables X such that log(X) is normally distributed.
If we choose log(X) to have a mean that is highly negative (and a variance that isn’t too large), then we can mostly ignore the fact that X takes values above 1, and treat it as a prior distribution for ρ. The mean and variance of the log-normal distributions can be explicitly defined, thus giving the multiplications factor as:
Mμ,σ2=exp¯¯¯σ2.
Here, ¯¯¯σ2 is the variance of the normal distribution log(X). This ¯¯¯σ2 might be large, as it denotes (roughly) “we need N steps, each of probability p, to get life; let’s put a uniform-ish prior over a range of possible Ns”. Unlike 1/ρ, this is a proper prior, and a plausible one; therefore there are plausible priors with very large Mμ,σ2. The log normal is quite likely to appear, as it is the approximate limit of multiplying together a host of different independent parameters.
Multiplication law
Do you know what’s more likely to be useful than “the approximate limit of multiplying together a host of different independent parameters”? Actually multiplying together independent parameters.
Here R∗ is the number of stars in our galaxy, fp the fraction of those with planets, ne the number of planets that can support life per star that has planets,
fl the fraction of those that develop life, fi the fraction of those that develop intelligent life, fc the fraction of those that release detectable signs of their existence, and L is the length of time those civilizations endure as detectable.
Then the proportion of advanced civilizations per planet is qflfi, where q is the proportion of life-supporting planets among all planets. To compute the M of this distribution, we have the highly useful result (the proof is in this footnote[4]):
Let Xi be independent random variables with multiplicative factors Mi, and let M be the multiplicative factor of X=X1⋅X2⋅…⋅Xn. Then M=∏iMi - the total M is the product of the individual Mi.
The paper “dissolving the Fermi paradox” gives estimated distributions for all the terms in the Drake equation. The q, which doesn’t appear in that paper, is a constant, so has Mq=1. The fi has a log-uniform distribution from 0.001 to 1; the M can be computed from the mean and variance of such distributions, so Mfi=log(1/0.001)1−0.00122(1−0.001)2≈3.5.
The fl term is more complicated; it is distributed like g(X)=1−e−eX⋅50log(10) where X is a standard normal distribution. Fortunately, we can estimate its mean and variance without having to figure out its distribution, by numerical integration of g(x) and g(x2) on the normal distribution. This gives μ≈0.5, σ2≈0.25 and M≈2. The overall the multiplicative effect of anthropic update is:
Mplanet≈7.
What if we considered the proportion of advanced civilization per star, rather than per planet? Then we can drop the q term and add in fp and ne. Those are both estimated to be distributed as log-uniform on [0.1,1]; for a total M of
Mstar≈14.
Why is the M higher for civilizations per star than civilizations per planet? That’s because when we update on our existence, we increase the proportion of civilizations per planet, but we also update the proportion of planets per star—both of these can make life more likely. The Mstar incorporates both effects, so is strictly higher than Mplanet.
We can do the same by considering the number of civilizations per galaxy; then we have to incorporate R∗ as well. This is log-uniform on [1,100], giving:
Mgalaxy≈32.
What about if we include the Fermi observation (the fact that we don’t see anything in our galaxy)? The “dissolving the Fermi paradox” paper shows there are multiple different ways of including this update, depending on how we parse out “not seeing anything” and how easy it is for civilizations to expand.
I did a crude estimate here by taking the Fermi observation to mean “the proportion of civilizations per galaxy must be less than one”. Then I did a Monte-Carlo simulation, ignoring all results above 0 on the log scale:
From this, I got an estimated mean of 0.027, variance of 0.014, and a total multiplier of:
Mgalaxy, Fermi≈21.
With the Fermi observation and the anthropic update combined, we expect 0.56 civilizations per galaxy.
Limitations of the multiplier
Low multiplier, strong effects
It’s important to note that the anthropic update can be very strong, without changing the expected population much. So a low Mμ,σ2 doesn’t necessary mean a low impact.
Consider for instance the presumptuous philosopher, slightly modified to use planetary population densities. Thus theory T1 predicts ρ=1/1012 (one in a trillion) and T2 predicts ρ=1; we put initial probabilities 1/2 on both theories.
As Nick Bostrom noted, the SIA update pushes T2 to being a trillion times more probable than T1; a postiori, T2 is roughly a certainty (the actual probability is 1012/(1012+1)).
However, the expected population goes from roughly 1/2 (the average of 1/1012 and 1) to roughly 1 (since a postioriT2 is almost certain). This gives a Mμ,σ2 of roughly 2. So, despite the strong update towards T2, the actual population update is small—and, conversely, despite the actual population update being small, we have a strong update towards T2.
Combining multiple theories
In the previous post, note that that both T1 and T2 were point estimates: they posit a constant ρ. So they have a variance of zero, and hence a Mμ,σ2 of 1. But T2 has a much stronger anthropic update. Thus we can’t use their Mμ,σ2 to compare the anthropic effects on different theories.
We also can’t relate the individual Ms to that of a combined theory. As we’ve seen, T1 and T2 have Ms of 1, but the combined theory (1/2)T1+(1/2)T2 has an M of roughly 2. But we can play around with the relative initial weight of T1 and T2 to get other Ms.
If we started with odds 1012:1 on T1 vs T2, then this has a mean ρ of roughly 10−12; the anthropic update sends it to 1:1 odds, with a mean of roughly 1/2. So this combined theory has an M of roughly 1012/2, half a trillion.
But, conversely, if we started with odds 1:1012 on T1 vs T2, then we have an initial mean of ρ of roughly one; its anthropic update is odds of 1:1024, also with a mean of roughly one. So this combined theory has an M of roughly 1.
There is a weak relation between M and the Mi of the various Ti. Let Mi be the multiplier of Ti has a multiplier of Mi; we can reorder the Ti so that Mi≤Mj for i≤j. Let T be a combined theory that assigns probability pi to Ti.
For all {pi}, M≥mini(Mi).
For all ϵ, there exists {pi} with all pi>0, so that M<mini(M1)+ϵ.
So, the minimum value of the Mi is a lower bound on M, and we can get arbitrarily close to that bound. See the proof in this footnote[5].
Given a fixed ρ, the probability of observing life on our own planet is exactly ρ. So Bayes’s theorem implies that f′(ρ)∝ρf(ρ). With the full normalisation, this is
f′(ρ)=ρf(ρ)∫10ρf(ρ)dρ.
If we want to get the mean μ′ of this distribution, we further multiply by ρ and integrate:
So, let Xi be independent random variables with means μi and variances σ2i. Let X=∏iXi, which has mean μ and variance σ2. Due to the independence of the Xi, the expectations of their products are the product of their expectations. Note that X2i and X2j are also independent if i≠j. Then we have:
Let {fi}1≤i≤n be probability distributions on ρ, with mean μi, variance σ2i, expectation squared si=Efi(ρ2)=σ2i+μ2i, and Mi=si/μ2i. Without loss of generality, reorder the fi so that Mi≤Mj for i<j.
Let f be the probability distribution f=p1f1+…pnfn, with associated multiplier M. Without loss of generality, assume Mi≤Mj for i<j. Then we’ll show that M≥M1.
We’ll first show this in the special case where n=2 and M1=M2, then generalise to the general case, as is appropriate for a generalisation. If s1/μ21=M1=M2=s2/μ22, then, since all terms are non-negative, there exists an α such that s1=α2s2 while μ1=αμ2. Then for any given p=p1, the M of f is:
The function x→x2 is convex, so, interpolating between the values x=1 and x=α, we know that for all 0≤p≤1, the term (1(p)+α(1−p))2 must be lower than 12(p)+α2(1−p). Therefore (1(p)+α2(1−p))/(1(p)+α(1−p))2 is at most 1, and M(p)≤M1. This shows the result for n=2 if M1=M2.
Now assume that M2>M1, so that s1/μ21<s2/μ22. Then replace s2 with s′2, which is lower than s2, so that s1/μ21=s′2/μ22. If we define M′(p) as the expression for M(p) with $s_2′ substituting for s2, we know that M′(p)≤M(p), since s′2<s2. Then the previous result shows that M′(p)≥M1, thus M(p)≥M1 too.
To show the result for larger n, we’ll induct on n. For n=1 the result is a tautology, M1≤M1, and we’ve shown the result for n=2. Assume the result is true for n−1, and then notice that f=p1f1+…pnfn can be re-written as f=p1f1+(1−p1)f′, where f′=(p′2f2+…p′nfn) for p′i=pn/(1−p1). Then, by the induction hypothesis, if M′ is the M of f′, then M′≥M2. Then applying the result for n=2 between f1 and f′, gives M≤min(M1,M′). However, since M1≤M2 and M′≥M2, we know that min(M1,M′)=M1, proving the general result.
To show M can get arbitrarily close to M1, simply note that M is continuous in the {pi}, define p1=1−ϵ, pi=ϵ/(n−1) for i>1, and let ϵ tend to 0.
The SIA population update can be surprisingly small
With many thanks to Damon Binder, and the spirited conversations that lead to this post, and to Anders Sandberg.
People often think that the self-indication assumption (SIA) implies a huge number of alien species, millions of times more than otherwise. Thought experiments like the presumptuous philosopher seem to suggest this.
But here I’ll show that, in many cases, updating on SIA doesn’t change the expected number of alien species much. It all depends on the prior, and there are many reasonable priors for which the SIA update does nothing more than double the probability of life in the universe[1].
This can be the case even if the prior says that life is very unlikely! We can have a situation where we are astounded, flabbergasted, and disbelieving about our own existence—“how could we exist, how can this beeeeee?!?!?!?”—and still not update much—“well, life is still pretty unlikely elsewhere, I suppose”.
In the one situation where we have an empirical distribution, the “Dissolving the Fermi Paradox” paper, the effect of the SIA anthropics update is to multiply the expected civilization per planet by seven. Not seven orders of magnitude—just seven.
The formula
Let ρ∈[0,1] be the probability of advanced space-faring life evolving on a given planet; for the moment, ignore issues of life expanding to other planets from their one point of origin. Let f be the prior distribution of ρ, with mean μ and variance σ2. This means that, if we visit another planet, our probability of finding life is μ.
On this planet, we exist[2]. Then if we update on our existence we get a new distribution f′; this distribution will have mean μ′:
μ′=μ(1+σ2μ2).
To see a proof of this result, look at this footnote[3].
Define Mμ,σ2=1+σ2/μ2 to be this multiplicative factor between μ and μ′; we’ll show that there are many reasonable situations where Mμ,σ2 is surprisingly low: think 2 to 100, rather than in the millions or billions.
Beta distributions I
Let’s start with the most uninformative prior of all: a uniform prior over [0,1]. The expectation of ρ is ∫10ρdρ=1/2, so, without any other information, we expect a planet to have life with 50% probability. The variance is σ2=1/12.
Thus if we update on our existence on Earth, we get the posterior f′(ρ)=2ρ; the mean of this is 2/3 (either direct calculation or using M1/2,1/12=1+4/12=4/3).
Even though this change in expectation is multiplicatively small, it does seem that the uniform prior and the f′(ρ) are very different, with f′(ρ) heavily skewed to the right. But now consider what happens if we look at Mars and notice that it hasn’t got life. The probability of no life, given ρ, is 1−ρ. Updating on this and renormalising gives a posterior 6ρ(1−ρ):
The expectation of 6ρ(1−ρ), symmetric around 1/2, is of course 1/2. Thus one extra observation (that Mars is dead) has undone, in expectation, all the anthropic impact of our own existence.
This is an example of a beta distribution for α=2 and β=2 (yes, beta distributions have a parameter called β and another one that’s α; just deal with it). Indeed, the uniform prior is also a beta distribution (with α=β=1) as is the anthropic updated version 2ρ (which has α=2, β=1).
The update rule for beta distributions is that a positive observation (ie life) increases α by 1, and a negative observation (a dead planet) increases β by 1. The mean of an updated beta distribution is a generalised version of Laplace’s law of succession: if our prior is a beta distribution with parameters α and β, and we’ve had m positive observations and n negative ones, then the mean of the posterior is:
α+mα+β+m+n.
Suppose now that we have observed n dead planets, but no life, and that we haven’t done an anthropic update yet, then we have a probability of life of α/(α+β+n). Upon adding the anthropic update, this shifts to (α+1)/(α+β+n+1), meaning that the multiplicative factor is at most (α+1)/α. If we started with the uniform prior with its α=1, this multiplies the probability of life by at most 2. In a later section, we’ll look at α<1.
High prior probability is not required for weak anthropic update
The uniform prior has α=β=1 and starts at expectation 1/2. But we can set α=1 and a much higher β, which skews the distribution to the left; for example, for β=2, 3, and 10:
Even though these priors are skewed to the left, and have lower prior probabilities of life (1/3, 1/4, and 1/11), the anthropic update has a factor Mμ,σ2 that is less than 2.
Also note that if we scale the prior f by a small ϵ, so replace f(ρ) on the range [0,1] with f(ρ/ϵ)/ϵ on the range [0,ϵ], then μ is multiplied by ϵ and σ2 is multiplied by ϵ2. Thus Mμ,ϵ is unchanged. Here, for example, is the uniform distribution, scaled down by ϵ=1, ϵ=1/3, and ϵ=1/20:
All of these will have the same Mμ,σ2 (which is 4/3, just as for the uniform distribution). And, of course, doing the same scaling with the various beta distributions we’ve seen up until now will also keep Mμ,σ2 constant.
Thus there are a lot of distributions with very low μ (ie very low prior probability of life) but an Mμ,σ2 that’s less than 2 (ie the anthropic update is less than a doubling of the probability of life).
Beta distributions II and log-normals
The best-case scenario for Mμ,σ2 is if f assigns probability 1 to ρ=μ. In that case, σ2=0 and M=1: the anthropic update changes nothing.
Conversely, the worse-case scenario for Mμ,σ2 is if f only allows ρ=0 and ρ=1. In that case, f assigns probability μ to 1 and 1−μ to 0, for a mean of μ and a variance of σ2=μ−μ2, and a multiplicative factor of Mμ,σ2=1/μ. In this case, after anthropic update, f′ assigns certainty to ρ=1 (since any life at all, given this f, means life on all planets).
But there are also more reasonable priors with large Mμ,σ2. We’ve already seen some, implicitly, above: the beta distributions with α<1. In that case, Mμ,σ2 is bounded by (α+1)/α. If α=3/4 and β=1, for instance, this corresponds to the (unbounded) distribution f(ρ)=(3/4)ρ−1/4; the multiplicative factor is below 7/3, which is slightly above 2. But as α declines, the multiplicative factor can go up surprisingly fast; at α=1/2 it is 3, at α=1/4 it is 5:
In general, for α=1/n, the multiplicative factor is bounded by n+1. This gets arbitrarily large as α→0. Though α=0 itself corresponds to the improper prior f(ρ)=1/ρ, whose integral diverges. On a log scale, this corresponds to the log-uniform distribution, which is roughly what you get if you assume “we need N steps, each of probability p, to get life; let’s put a uniform prior over the possible Ns”.
It’s not clear why one might want to choose α=1/1020 for a prior, but there is a class of prior that is much more natural: the log-normal distributions. These are random variables X such that log(X) is normally distributed.
If we choose log(X) to have a mean that is highly negative (and a variance that isn’t too large), then we can mostly ignore the fact that X takes values above 1, and treat it as a prior distribution for ρ. The mean and variance of the log-normal distributions can be explicitly defined, thus giving the multiplications factor as:
Mμ,σ2=exp¯¯¯σ2.
Here, ¯¯¯σ2 is the variance of the normal distribution log(X). This ¯¯¯σ2 might be large, as it denotes (roughly) “we need N steps, each of probability p, to get life; let’s put a uniform-ish prior over a range of possible Ns”. Unlike 1/ρ, this is a proper prior, and a plausible one; therefore there are plausible priors with very large Mμ,σ2. The log normal is quite likely to appear, as it is the approximate limit of multiplying together a host of different independent parameters.
Multiplication law
Do you know what’s more likely to be useful than “the approximate limit of multiplying together a host of different independent parameters”? Actually multiplying together independent parameters.
The famous Drake equation is:
R∗⋅fp⋅ne⋅fl⋅fi⋅fc⋅L.
Here R∗ is the number of stars in our galaxy, fp the fraction of those with planets, ne the number of planets that can support life per star that has planets, fl the fraction of those that develop life, fi the fraction of those that develop intelligent life, fc the fraction of those that release detectable signs of their existence, and L is the length of time those civilizations endure as detectable.
Then the proportion of advanced civilizations per planet is qflfi, where q is the proportion of life-supporting planets among all planets. To compute the M of this distribution, we have the highly useful result (the proof is in this footnote[4]):
Let Xi be independent random variables with multiplicative factors Mi, and let M be the multiplicative factor of X=X1⋅X2⋅…⋅Xn. Then M=∏iMi - the total M is the product of the individual Mi.
The paper “dissolving the Fermi paradox” gives estimated distributions for all the terms in the Drake equation. The q, which doesn’t appear in that paper, is a constant, so has Mq=1. The fi has a log-uniform distribution from 0.001 to 1; the M can be computed from the mean and variance of such distributions, so Mfi=log(1/0.001)1−0.00122(1−0.001)2≈3.5.
The fl term is more complicated; it is distributed like g(X)=1−e−eX⋅50log(10) where X is a standard normal distribution. Fortunately, we can estimate its mean and variance without having to figure out its distribution, by numerical integration of g(x) and g(x2) on the normal distribution. This gives μ≈0.5, σ2≈0.25 and M≈2. The overall the multiplicative effect of anthropic update is:
Mplanet≈7.
What if we considered the proportion of advanced civilization per star, rather than per planet? Then we can drop the q term and add in fp and ne. Those are both estimated to be distributed as log-uniform on [0.1,1]; for a total M of
Mstar≈14.
Why is the M higher for civilizations per star than civilizations per planet? That’s because when we update on our existence, we increase the proportion of civilizations per planet, but we also update the proportion of planets per star—both of these can make life more likely. The Mstar incorporates both effects, so is strictly higher than Mplanet.
We can do the same by considering the number of civilizations per galaxy; then we have to incorporate R∗ as well. This is log-uniform on [1,100], giving:
Mgalaxy≈32.
What about if we include the Fermi observation (the fact that we don’t see anything in our galaxy)? The “dissolving the Fermi paradox” paper shows there are multiple different ways of including this update, depending on how we parse out “not seeing anything” and how easy it is for civilizations to expand.
I did a crude estimate here by taking the Fermi observation to mean “the proportion of civilizations per galaxy must be less than one”. Then I did a Monte-Carlo simulation, ignoring all results above 0 on the log scale:
From this, I got an estimated mean of 0.027, variance of 0.014, and a total multiplier of:
Mgalaxy, Fermi≈21.
With the Fermi observation and the anthropic update combined, we expect 0.56 civilizations per galaxy.
Limitations of the multiplier
Low multiplier, strong effects
It’s important to note that the anthropic update can be very strong, without changing the expected population much. So a low Mμ,σ2 doesn’t necessary mean a low impact.
Consider for instance the presumptuous philosopher, slightly modified to use planetary population densities. Thus theory T1 predicts ρ=1/1012 (one in a trillion) and T2 predicts ρ=1; we put initial probabilities 1/2 on both theories.
As Nick Bostrom noted, the SIA update pushes T2 to being a trillion times more probable than T1; a postiori, T2 is roughly a certainty (the actual probability is 1012/(1012+1)).
However, the expected population goes from roughly 1/2 (the average of 1/1012 and 1) to roughly 1 (since a postiori T2 is almost certain). This gives a Mμ,σ2 of roughly 2. So, despite the strong update towards T2, the actual population update is small—and, conversely, despite the actual population update being small, we have a strong update towards T2.
Combining multiple theories
In the previous post, note that that both T1 and T2 were point estimates: they posit a constant ρ. So they have a variance of zero, and hence a Mμ,σ2 of 1. But T2 has a much stronger anthropic update. Thus we can’t use their Mμ,σ2 to compare the anthropic effects on different theories.
We also can’t relate the individual Ms to that of a combined theory. As we’ve seen, T1 and T2 have Ms of 1, but the combined theory (1/2)T1+(1/2)T2 has an M of roughly 2. But we can play around with the relative initial weight of T1 and T2 to get other Ms.
If we started with odds 1012:1 on T1 vs T2, then this has a mean ρ of roughly 10−12; the anthropic update sends it to 1:1 odds, with a mean of roughly 1/2. So this combined theory has an M of roughly 1012/2, half a trillion.
But, conversely, if we started with odds 1:1012 on T1 vs T2, then we have an initial mean of ρ of roughly one; its anthropic update is odds of 1:1024, also with a mean of roughly one. So this combined theory has an M of roughly 1.
There is a weak relation between M and the Mi of the various Ti. Let Mi be the multiplier of Ti has a multiplier of Mi; we can reorder the Ti so that Mi≤Mj for i≤j. Let T be a combined theory that assigns probability pi to Ti.
For all {pi}, M≥mini(Mi).
For all ϵ, there exists {pi} with all pi>0, so that M<mini(M1)+ϵ.
So, the minimum value of the Mi is a lower bound on M, and we can get arbitrarily close to that bound. See the proof in this footnote[5].
As we’ll see, the population update is small even in the presumptuous philosopher experiment itself.
Citation partially needed: I’m ignoring Boltzmann brains and simulations and similar ideas.
Given a fixed ρ, the probability of observing life on our own planet is exactly ρ. So Bayes’s theorem implies that f′(ρ)∝ρf(ρ). With the full normalisation, this is
f′(ρ)=ρf(ρ)∫10ρf(ρ)dρ.
If we want to get the mean μ′ of this distribution, we further multiply by ρ and integrate:
μ′=Ef′(ρ)=∫10ρ2f(ρ)∫10ρf(ρ)dρdρ=∫10ρ2f(ρ)dρ∫10ρf(ρ)dρ.
Let’s multiply this by 1=1/1=(∫10f(ρ)dρ)/(∫10f(ρ)dρ) and regroup the terms:
μ′=∫10ρ2f(ρ)dρ∫10f(ρ)dρ⋅∫10f(ρ)dρ∫10ρf(ρ)dρ.
Thus μ′= Ef(ρ2)/Ef(ρ)= (σ2+μ2)/μ= μ(1+σ2/μ2), using the fact that the variance is the expectation of ρ2 minus the square of the expectation of ρ.
I adapted the proof in this post.
So, let Xi be independent random variables with means μi and variances σ2i. Let X=∏iXi, which has mean μ and variance σ2. Due to the independence of the Xi, the expectations of their products are the product of their expectations. Note that X2i and X2j are also independent if i≠j. Then we have:
∏iMμi,σ2i=∏i(1+σ2iμ2i)=∏i(μ2i+σ2iμ2i)=∏i(E(X2i)μ2i)=∏i(E(X2i))∏iE(Xi)2=E(X2)E(X)2=μ2+σ2μ2=1+σ2μ2=Mμ,σ2.
Let {fi}1≤i≤n be probability distributions on ρ, with mean μi, variance σ2i, expectation squared si=Efi(ρ2)=σ2i+μ2i, and Mi=si/μ2i. Without loss of generality, reorder the fi so that Mi≤Mj for i<j.
Let f be the probability distribution f=p1f1+…pnfn, with associated multiplier M. Without loss of generality, assume Mi≤Mj for i<j. Then we’ll show that M≥M1.
We’ll first show this in the special case where n=2 and M1=M2, then generalise to the general case, as is appropriate for a generalisation. If s1/μ21=M1=M2=s2/μ22, then, since all terms are non-negative, there exists an α such that s1=α2s2 while μ1=αμ2. Then for any given p=p1, the M of f is:
M(p)=ps1+(1−p)s2(pμ1+(1−p)μ2)2=ps1+(1−p)α2s1(pμ1+(1−p)αμ1)2=M11(p)+α2(1−p)(1(p)+α(1−p))2.
The function x→x2 is convex, so, interpolating between the values x=1 and x=α, we know that for all 0≤p≤1, the term (1(p)+α(1−p))2 must be lower than 12(p)+α2(1−p). Therefore (1(p)+α2(1−p))/(1(p)+α(1−p))2 is at most 1, and M(p)≤M1. This shows the result for n=2 if M1=M2.
Now assume that M2>M1, so that s1/μ21<s2/μ22. Then replace s2 with s′2, which is lower than s2, so that s1/μ21=s′2/μ22. If we define M′(p) as the expression for M(p) with $s_2′ substituting for s2, we know that M′(p)≤M(p), since s′2<s2. Then the previous result shows that M′(p)≥M1, thus M(p)≥M1 too.
To show the result for larger n, we’ll induct on n. For n=1 the result is a tautology, M1≤M1, and we’ve shown the result for n=2. Assume the result is true for n−1, and then notice that f=p1f1+…pnfn can be re-written as f=p1f1+(1−p1)f′, where f′=(p′2f2+…p′nfn) for p′i=pn/(1−p1). Then, by the induction hypothesis, if M′ is the M of f′, then M′≥M2. Then applying the result for n=2 between f1 and f′, gives M≤min(M1,M′). However, since M1≤M2 and M′≥M2, we know that min(M1,M′)=M1, proving the general result.
To show M can get arbitrarily close to M1, simply note that M is continuous in the {pi}, define p1=1−ϵ, pi=ϵ/(n−1) for i>1, and let ϵ tend to 0.