Insights from the randomness/ignorance model are genuine
(Based on the randomness/ignorance model proposed in 1 2 3.)
The bold claim of this sequence thus far is that the randomness/ignorance model solves a significant part of the anthropics puzzle. (Not everything since it’s still incomplete.) In this post I argue that this “solution” is genuine, i.e. it does more than just redefine terms. In particular, I argue that my definition of probability for randomness is the only reasonable choice.
The only axiom I need for this claim is that probability must be consistent with betting odds in all cases: if comes true in two of three situations where is observed, and this is known, then needs to be , and no other answer is acceptable. This idea isn’t new; the problem with it is that it doesn’t actually produce a definition of probability, because we might not know how often comes true if is observed. It cannot define probability in the original Presumptuous Philosopher problem, for example.
But in the context of the randomness/ignorance model, the approach becomes applicable. Stating my definition for when uncertainty is random in one sentence, we get
Your uncertainty about , given observation , is random iff you know the relative frequency with which happens, evaluated across all observations that, for you, are indistinguishable to with regard to .
Where “relative frequency” is the frequency of compared to , i.e. you know that happens in out of cases. A good look at this definition shows that it is precisely the condition needed to apply the betting odds criterion. So the model simply divides everything into those cases where you can apply betting odds and those where you can’t.
If the Sleeping Beauty experiment is repeated sufficiently often using a fair coin, then roughly half of all experiments will run in the 1-interview version, and the other half will run the 2-interview version. In that case, Sleeping Beauty’s uncertainty is random and the reasoning from 3 goes through to output for it being Monday. The experiment being repeated sufficiently often might be considered a reasonably mild restriction; in particular, it is a given if the universe is large enough that everything which appears once appears many times. Given that Sleeping Beauty is still controversial, the model must thus be either nontrivial or wrong, hence “genuine”.
Here is an alternative justification for my definition of random probability. Suppose is the hypothesis we want to evaluate (like “today is Monday”) and is the full set of observations we currently have (formally, the full brain state of Sleeping Beauty). Then what we care about is the value of . Now consider the term ; let’s call it . If is known, then can be computed as , so knowledge of implies knowledge of and vice-versa. But is more “fundamental” than , in the sense that it can be defined as the ratio of two frequencies. Take all situations in which – or any other a set of observations which, from your perspective, is indistinguishable to – is observed, and count in how many of those is true vs. false. The ratio of these two values is .
A look at the above criterion for randomness shows that it’s just another way of saying that the value of is known. Since, again, the value of determines the value of , this means that the definition of probability as betting odds, in the case that the relevant uncertainty is random, falls almost directly out of the formula.
This seems like a step backwards from UDASSA, another potential solution to many anthropic problems. UDASSA has a completely formal specification, while this model relies on a somewhat unclear verbal definition. So you need to know the ‘relative frequency’ with which H happens. But what are we averaging over here? Our universe? All possible universes? If uncertain about which universe we are in, how should we average over the different universes? What if we are reasoning about an event which, as far as we know, will only happen once?
I have answers to all of these questions! I just haven’t posted them yet. If I present an entirely new theory in one super long post, then obviously no-one reads it. In fact, it would be irrational to read it because the prior that I’m onto something is just too low to invest the time. A sequence of short posts where each post makes a point which can be understood by anyone having read up to that post – that’s not optimal, but how else could you do it? This is a completely genuine question if you have an answer.
So the structure I’ve chosen is to first state the distinction, then lay out the model that deals with randomness only (because that already does some stuff which SIA and SSA can’t), then explain how to deal with ignorance, which makes the model complete, and then present a formalized version. The questions you just listed all deal with the ignorance part, the part that’s still in the pipeline.
Well, and I didn’t know I was competing with UDASSA, because I didn’t know it existed. For some reason it’s sitting at 38 karma, which makes it easy to miss, and you’re the first to bring it up. I’ll read it before I post anything else.
It’s true that UDASSA is tragically underrated, given that(it seems to me) it provides a satisfactory resolution to all anthropic problems. I think this might be a situation where people tend to leave the debate and move on to something else when they seem to have found a satisfactory position, like how most LW people don’t bother arguing about whether god exists anymore.
Well not exactly, I came up with UDASSA originally but found it not entirely satisfactory, so I moved on to something that eventually came to be called UDT. I wrote down my reasons at against UD+ASSA and under Paul’s post.
Perhaps it would be good to have this history be more readily available to people looking for solutions to anthropic reasoning though, if you guys have suggestions on how to do that.
The solution to this kind of thing should be a wiki, I think. If the LessWrong wiki were kept up to date enough to have a page on anthropics, that would have solved the issue in this case and should work for many similar cases.
Right, I knew that many people had since moved on to UDT due to limitations of UDASSA for decision-making. What I meant was that UDASSA seems to be satisfactory at resolving the typical questions about anthropic probabilities, setting aside decision theory/noncomputability issues.
I agree it would be nice to have all this information in an readily-accessible place. Maybe the posts setting out the ideas and later counter-arguments could be put in a curated sequence.
I actually knew about UDT. Enough to understand how it wins in Transparent Newcomb, but not enough to understand that it extends to anthropic problems.
This is what I’m doing. I haven’t read the entire thing yet, but this paragraph basically explains the key idea of my model. I was going to address how to count instances eventually (near the end), and it bottoms out at observer moments. The full idea, abbreviated, is “start with a probability distribution over different universes, in each one apply the randomness thing via counting observer moments, then weigh those results with your distribution”. This gives you intuitive results in Doomsday (no update), P/P (some bias towards larger universe depending on how strongly you believe in other universes), Sleeping Beauty (basically 1⁄3) and the “how do we update on X-risk given that we’re still alive” question (complicated).
It appears that I independently came up with ASSA, plus a different way of presenting it. And probably a weaker formalism.
I’m obviously unhappy about this, but thank you for bringing it to my attention now rather than later.
One reason I was assuming there couldn’t be other theories I was unaware of is that Stuart Armstrong was posting about anthropics and he seemed totally unaware.
Yeah, I also had similar ideas for solving anthropics a few years ago, and was surprised when I learned that UDASSA had been around for so long. At least you can take pride in having found the right answer independently.
I think that UDASSA gives P(heads) = 1⁄2 on the Sleeping Beauty problem due to the way it weights different observer-moments, proportional to 2^(-description length). This might seem a bit odd, but I think it’s necessary to avoid problems with Boltzmann brains and the like.
You mean P(monday)? In that case it would be different although have some similarity. Why is the description length of the monday observer moment longer than the tuesday one?
No, I mean Beauty’s subjective credence that the coin came up heads. That should be 1⁄2 by the nature of a coin flip. Then, if the coin comes up tails, you need 1 bit to select between the subjectively identical states of waking up on Monday or Tuesdsay. So in total:
P(heads, Monday) = 1⁄2,
P(tails, Monday) = 1⁄4
P(tails, Tuesday) = 1⁄4
(EDIT: actually this depends on how difficult it is to locate memories on Monday vs. Tuesday, which might be harder given that your memory has been erased. I think that for ‘natural’ ways of locating your consciousness it should be close to 12/ 14 / 14 though)
(DOUBLE EDIT, MUCH LATER: actually it now seems to me like the thirder position might apply here, since the density of spacetime locations with the right memories is higher in the tails branch than the heads)
I guess I’m a bit out of the loop on questions about how to define uncertainty, so I’m a bit confused about what position you are against or how this is different from what others do. That is, it seems to be like you are trying to fix a problem you perceive in the way people currently think about uncertainty, but I’m not sure what that problem is so that I can even understand how this framing might fix it. I’ve been reading this sequence of posts thinking “yeah, sure, this all sounds reasonable” but also without really understanding the context for it. I know you did the post on anthropics, but even there it wasn’t really that clear to me how this framing helps us over what is perhaps otherwise normally done, although perhaps that reflects my ignorance of existing arguments about what methods of anthropic reasoning are correct.
Yeah, I wrote this assuming people have the context.
So there’s a class of questions where standard probability theory doesn’t give clear answers. This was dubbed anthropics or anthropic probability. To deal with this, two principles were worked out, SSA and SIA, which are well-defined and produce answers. But for both of them, there are problems where their answers seem absurd.
I think the best way to understand the problem of anthropics is by looking at the Doomsday argument as an example. Consider all humans who will ever live (assuming they’re not infinitely many). Say that’s N many. For simplicity, we assume that there are only two cases, either humanity goes extinct tomorrow, in which case N is about sixty billion – but let’s make that 1011 for simplicity – or humanity flourishes and expands through the cosmos, in which case N is, say, 1018. Let’s call S the hypothesis that humans go extinct, and L the hypothesis that they don’t (that’s for “short” and “long” human history). Now we want to update on P(L) given the observation that you are human number n (so n will be about 30 billion). Let’s call that observation O. Also let p be your prior on L, so P(L)=p.
The Doomsday argument now goes as follows. The term P(O|L) is 10−18, because if L is true then there are a total of 1018 people, each position is equally likely, so 10−18 is just the chance to get your particular one. On the other hand, P(O|S) is 10−11, because if S is true there are only 1011 people total. So we simply apply Bayes on the observation O, and then use the law of total probability in the demonimator to obtain
P(L|O)=P(O|L)P(L)P(O)=10−18pP(O|L)P(L)+P(O|¬L)P(¬L)=10−18p10−18p+10−12(1−p)
If p=0.999, this term equals about 0.00989. So even if you were very confident that humanity would make it, you should still assign just below 1% on that after updating. If you want to work it out yourself, this is where you should pause and think about what part of this is wrong.
So the part that’s problematic is the probability for P(O|L). There is a hidden assumption that you had to be one of the humans who was actually born. This was then dubbed the Self-Sampling Assumption (SSA), namely
So SSA endorses the Doomsday argument. The principled way to debunk this is the Self-Indexing Assumption (SIA), which says
If you apply SIA, then P(O|L)=P(O|S) and hence P(L|O)=P(O). Updating on O no longer does anything.
So this is the problem where SSA gives a stupid anwer. The problem where SIA gives the stupid answer is the Presumptuous Philosopher problem: there are two theories of how large the universe is, according to one it’s 109 times as large as it is according to the other. If you apply the SIA rule, you get that the odds for living in the small universe is 11+109 (if the prior was 12 on both).
There is also Full Non-indexical Conditioning which is technically a different theory, and it argues differently, but it outputs the same as SIA in every case, so basically there are just the two. And that, as far as I know, is the state of the art. No-one has come up with a theory that can’t be made to look ridiculous. Stuart Armstrong has made a bunch of LW posts about this recently-ish, but he hasn’t proposed a solution, he’s pointed out that existing theories are problematic. This one, for example.
I’ve genuinely spent a lot of time thinking really hard about this stuff, and my conclusion is that the “reason as if you’re randomly selected from a set of observers” thing is the key problem here. I think that’s the reason why this still hasn’t been worked out. It’s just not the right way to look at it. I think the relevant variable which everyone is missing is that there are two fundamentally different kinds of uncertainty, and if you structure your theory around that, everything works out. And I think I do have a theory where everything works out. It doesn’t update on Doomsday and it doesn’t say the large universe is 109 times as likely as the small one. It doesn’t give a crazy answer anywhere. And it does it all based on simple principles.
Does that answer the question? It’s possible that I should have started the sequence with a post that states the problem; like I just assumed everyone would know the problem without ever thinking about whether that’s actually the case.
Could you explain why the Doomsday argument answer seems absurd, or why I don’t have to be a human who was actually born?
I think so, thanks.
“The experiment being repeated sufficiently often might be considered a reasonably mild restriction; in particular, it is a given if the universe is large enough that everything which appears once appears many times.”
Why is that a given? The set of integers is very large, but the number 3 only appears once in it.
I think the relevant difference is that, in the set of integers, each element is strictly more complex than the previous one, but in the universe, you can probably upper bound the complexity (that’s what I’m assuming, anyway). So eventually stuff should repeat, and then anything that has a nonzero probability of appearing will appear arbitrarily often as you increase the size. For example, if there’s an upper bound to the complexity of a planet, then you can only have that many planets until you get a repeat.
That doesn’t seem to follow, actually. You could easily have a very large universe that’s almost entirely empty space (which does “repeat”), plus a moderate amount of structures that only appear once each.
And as a separate argument, plenty of processes are irreversible in practice. For instance, consider a universe where there’s a “big bang” event at the start of time, like an ordinary explosion. I’d expect that universe to never return to that original intensely-exploding state, because the results of explosions don’t go backwards in time, right?
Yeah, nonemptiness was meant to be part of the assumption in the phrase you quoted.
We’re getting into territory where I don’t feel qualified to argue – although it seems like that objection only applies to some very specific things, and probably not to most Sleeping Beauty like scenarios.
Not by algorithmic complexity. The integer consisting of a million 3s in a row is quite compressible.
But by number of bits, which is what you need to avoid repetition.
The typical answer is that this is a result of the Poincaré recurrence theorem
Thanks for the mention, I had never heard of that concept before.
I have strong reflexes of revulsion against this idea that everything must reoccur (aren’t plenty of processes irreversible in our world?), but it’s getting too off-topic for the original article, and I need to think more about this.