Insights from the randomness/ignorance model are genuine

Rafael Harth13 Nov 2019 16:18 UTC

6 points

(Based on the randomness/ignorance model proposed in 1 $\to$ 2 $\to$ 3.)

The bold claim of this sequence thus far is that the randomness/ignorance model solves a significant part of the anthropics puzzle. (Not everything since it’s still incomplete.) In this post I argue that this “solution” is genuine, i.e. it does more than just redefine terms. In particular, I argue that my definition of probability for randomness is the only reasonable choice.

The only axiom I need for this claim is that probability must be consistent with betting odds in all cases: if $H$ comes true in two of three situations where $O$ is observed, and this is known, then $P (H | O)$ needs to be $\frac{2}{3}$ , and no other answer is acceptable. This idea isn’t new; the problem with it is that it doesn’t actually produce a definition of probability, because we might not know how often $H$ comes true if $B$ is observed. It cannot define probability in the original Presumptuous Philosopher problem, for example.

But in the context of the randomness/ignorance model, the approach becomes applicable. Stating my definition for when uncertainty is random in one sentence, we get

Your uncertainty about $H$ , given observation $O$ , is random iff you know the relative frequency with which $H$ happens, evaluated across all observations $O^{'}$ that, for you, are indistinguishable to $O$ with regard to $H$ .

Where “relative frequency” is the frequency of $H$ compared to $\neg H$ , i.e. you know that $H$ happens in $n$ out of $m$ cases. A good look at this definition shows that it is precisely the condition needed to apply the betting odds criterion. So the model simply divides everything into those cases where you can apply betting odds and those where you can’t.

If the Sleeping Beauty experiment is repeated sufficiently often using a fair coin, then roughly half of all experiments will run in the 1-interview version, and the other half will run the 2-interview version. In that case, Sleeping Beauty’s uncertainty is random and the reasoning from 3 goes through to output $\frac{2}{3}$ for it being Monday. The experiment being repeated sufficiently often might be considered a reasonably mild restriction; in particular, it is a given if the universe is large enough that everything which appears once appears many times. Given that Sleeping Beauty is still controversial, the model must thus be either nontrivial or wrong, hence “genuine”.

Here is an alternative justification for my definition of random probability. Suppose $H$ is the hypothesis we want to evaluate (like “today is Monday”) and $O$ is the full set of observations we currently have (formally, the full brain state of Sleeping Beauty). Then what we care about is the value of $P (H | O)$ . Now consider the term $\frac{P (H | O)}{P (H | \neg O)}$ ; let’s call it $λ$ . If $λ$ is known, then $P (H | O)$ can be computed as $P (H | O) = (1 + λ^{- 1})^{- 1}$ , so knowledge of $λ$ implies knowledge of $P (H | O)$ and vice-versa. But $λ$ is more “fundamental” than $P (H | O)$ , in the sense that it can be defined as the ratio of two frequencies. Take all situations in which $O$ – or any other a set of observations $O^{'}$ which, from your perspective, is indistinguishable to $O$ – is observed, and count in how many of those $H$ is true vs. false. The ratio of these two values is $λ$ .

A look at the above criterion for randomness shows that it’s just another way of saying that the value of $λ$ is known. Since, again, the value of $λ$ determines the value of $P (H | O)$ , this means that the definition of probability as betting odds, in the case that the relevant uncertainty is random, falls almost directly out of the formula.

What links here?

What currents of thought on LessWrong do you want to see distilled? by ryan_b (8 Jan 2021 21:43 UTC; 22 points)

Rafael Harth13 Nov 2019 16:18 UTC

6 points

23 comments2 min readLW link

interstice 13 Nov 2019 22:57 UTC
5 points
This seems like a step backwards from UDASSA, another potential solution to many anthropic problems. UDASSA has a completely formal specification, while this model relies on a somewhat unclear verbal definition. So you need to know the ‘relative frequency’ with which H happens. But what are we averaging over here? Our universe? All possible universes? If uncertain about which universe we are in, how should we average over the different universes? What if we are reasoning about an event which, as far as we know, will only happen once?
- Rafael Harth 14 Nov 2019 0:02 UTC
  1 point
  Parent
  I have answers to all of these questions! I just haven’t posted them yet. If I present an entirely new theory in one super long post, then obviously no-one reads it. In fact, it would be irrational to read it because the prior that I’m onto something is just too low to invest the time. A sequence of short posts where each post makes a point which can be understood by anyone having read up to that post – that’s not optimal, but how else could you do it? This is a completely genuine question if you have an answer.
  So the structure I’ve chosen is to first state the distinction, then lay out the model that deals with randomness only (because that already does some stuff which SIA and SSA can’t), then explain how to deal with ignorance, which makes the model complete, and then present a formalized version. The questions you just listed all deal with the ignorance part, the part that’s still in the pipeline.
  Well, and I didn’t know I was competing with UDASSA, because I didn’t know it existed. For some reason it’s sitting at 38 karma, which makes it easy to miss, and you’re the first to bring it up. I’ll read it before I post anything else.
  - interstice 14 Nov 2019 0:13 UTC
    1 point
    Parent
    It’s true that UDASSA is tragically underrated, given that(it seems to me) it provides a satisfactory resolution to all anthropic problems. I think this might be a situation where people tend to leave the debate and move on to something else when they seem to have found a satisfactory position, like how most LW people don’t bother arguing about whether god exists anymore.
    - Wei Dai 14 Nov 2019 3:03 UTC
      2 points
      Parent
      
      I think this might be a situation where people tend to leave the debate and move on to something else when they seem to have found a satisfactory position
      
      Well not exactly, I came up with UDASSA originally but found it not entirely satisfactory, so I moved on to something that eventually came to be called UDT. I wrote down my reasons at against UD+ASSA and under Paul’s post.
      
      Perhaps it would be good to have this history be more readily available to people looking for solutions to anthropic reasoning though, if you guys have suggestions on how to do that.
      - Rafael Harth 14 Nov 2019 9:59 UTC
        1 point
        Parent
        The solution to this kind of thing should be a wiki, I think. If the LessWrong wiki were kept up to date enough to have a page on anthropics, that would have solved the issue in this case and should work for many similar cases.
      - interstice 14 Nov 2019 3:35 UTC
        1 point
        Parent
        Right, I knew that many people had since moved on to UDT due to limitations of UDASSA for decision-making. What I meant was that UDASSA seems to be satisfactory at resolving the typical questions about anthropic probabilities, setting aside decision theory/noncomputability issues.
        I agree it would be nice to have all this information in an readily-accessible place. Maybe the posts setting out the ideas and later counter-arguments could be put in a curated sequence.
        Rafael Harth 14 Nov 2019 10:01 UTC
        1 point
        Parent
        I actually knew about UDT. Enough to understand how it wins in Transparent Newcomb, but not enough to understand that it extends to anthropic problems.
    - Rafael Harth 14 Nov 2019 0:30 UTC
      1 point
      Parent
      The ASSA is the Absolute Self Selection Assumption. It is a variant on the Self Selection Assumption (SSA) of Nick Bostrom. The SSA says that you should think of yourself as being a randomly selected conscious entity (aka “observer”) from the universe. The Absolute SSA extends this concept to “observer moments” (OMs). An observer moment is one moment of existence of an observer’s consciousness. If we think of conscious experience as a process, the OM is created by dividing this process up into small units of time such that no perceptible change occurs within that unit. The ASSA then says that you should think of the OM you are presently experiencing as being randomly selected from among all OMs in the universe.
      This is what I’m doing. I haven’t read the entire thing yet, but this paragraph basically explains the key idea of my model. I was going to address how to count instances eventually (near the end), and it bottoms out at observer moments. The full idea, abbreviated, is “start with a probability distribution over different universes, in each one apply the randomness thing via counting observer moments, then weigh those results with your distribution”. This gives you intuitive results in Doomsday (no update), P/P (some bias towards larger universe depending on how strongly you believe in other universes), Sleeping Beauty (basically ¹⁄₃) and the “how do we update on X-risk given that we’re still alive” question (complicated).
      It appears that I independently came up with ASSA, plus a different way of presenting it. And probably a weaker formalism.
      I’m obviously unhappy about this, but thank you for bringing it to my attention now rather than later.
      One reason I was assuming there couldn’t be other theories I was unaware of is that Stuart Armstrong was posting about anthropics and he seemed totally unaware.
      - interstice 14 Nov 2019 2:28 UTC
        1 point
        Parent
        Yeah, I also had similar ideas for solving anthropics a few years ago, and was surprised when I learned that UDASSA had been around for so long. At least you can take pride in having found the right answer independently.
        I think that UDASSA gives P(heads) = ¹⁄₂ on the Sleeping Beauty problem due to the way it weights different observer-moments, proportional to 2^(-description length). This might seem a bit odd, but I think it’s necessary to avoid problems with Boltzmann brains and the like.
        Rafael Harth 14 Nov 2019 10:04 UTC
        1 point
        Parent
        You mean P(monday)? In that case it would be different although have some similarity. Why is the description length of the monday observer moment longer than the tuesday one?
        interstice 14 Nov 2019 17:47 UTC
        1 point
        Parent
        No, I mean Beauty’s subjective credence that the coin came up heads. That should be ¹⁄₂ by the nature of a coin flip. Then, if the coin comes up tails, you need 1 bit to select between the subjectively identical states of waking up on Monday or Tuesdsay. So in total:
        
        P(heads, Monday) = ¹⁄₂,
        
        P(tails, Monday) = ¹⁄₄
        
        P(tails, Tuesday) = ¹⁄₄
        
        (EDIT: actually this depends on how difficult it is to locate memories on Monday vs. Tuesday, which might be harder given that your memory has been erased. I think that for ‘natural’ ways of locating your consciousness it should be close to $\frac{1}{2}$ / $\frac{1}{4}$ / $\frac{1}{4}$ though)
        
        (DOUBLE EDIT, MUCH LATER: actually it now seems to me like the thirder position might apply here, since the density of spacetime locations with the right memories is higher in the tails branch than the heads)
Gordon Seidoh Worley 13 Nov 2019 20:23 UTC
4 points
I guess I’m a bit out of the loop on questions about how to define uncertainty, so I’m a bit confused about what position you are against or how this is different from what others do. That is, it seems to be like you are trying to fix a problem you perceive in the way people currently think about uncertainty, but I’m not sure what that problem is so that I can even understand how this framing might fix it. I’ve been reading this sequence of posts thinking “yeah, sure, this all sounds reasonable” but also without really understanding the context for it. I know you did the post on anthropics, but even there it wasn’t really that clear to me how this framing helps us over what is perhaps otherwise normally done, although perhaps that reflects my ignorance of existing arguments about what methods of anthropic reasoning are correct.
- Rafael Harth 13 Nov 2019 22:27 UTC
  3 points
  Parent
  Yeah, I wrote this assuming people have the context.
  So there’s a class of questions where standard probability theory doesn’t give clear answers. This was dubbed anthropics or anthropic probability. To deal with this, two principles were worked out, SSA and SIA, which are well-defined and produce answers. But for both of them, there are problems where their answers seem absurd.
  I think the best way to understand the problem of anthropics is by looking at the Doomsday argument as an example. Consider all humans who will ever live (assuming they’re not infinitely many). Say that’s $N$ many. For simplicity, we assume that there are only two cases, either humanity goes extinct tomorrow, in which case $N$ is about sixty billion – but let’s make that $10^{11}$ for simplicity – or humanity flourishes and expands through the cosmos, in which case $N$ is, say, $10^{18}$ . Let’s call $S$ the hypothesis that humans go extinct, and $L$ the hypothesis that they don’t (that’s for “short” and “long” human history). Now we want to update on $P (L)$ given the observation that you are human number $n$ (so $n$ will be about 30 billion). Let’s call that observation $O$ . Also let $p$ be your prior on $L$ , so $P (L) = p$ .
  The Doomsday argument now goes as follows. The term $P (O | L)$ is $10^{- 18}$ , because if $L$ is true then there are a total of $10^{18}$ people, each position is equally likely, so $10^{- 18}$ is just the chance to get your particular one. On the other hand, $P (O | S)$ is $10^{- 11}$ , because if $S$ is true there are only $10^{11}$ people total. So we simply apply Bayes on the observation $O$ , and then use the law of total probability in the demonimator to obtain
  $P (L | O) = P (O | L) \frac{P (L)}{P (O)} = 10^{- 18} \frac{p}{P (O | L) P (L) + P (O | \neg L) P (\neg L)} = \frac{10^{- 18} p}{10^{- 18} p + 10^{- 12} (1 - p)}$
  If $p = 0.999$ , this term equals about 0.00989. So even if you were very confident that humanity would make it, you should still assign just below 1% on that after updating. If you want to work it out yourself, this is where you should pause and think about what part of this is wrong.
  So the part that’s problematic is the probability for $P (O | L)$ . There is a hidden assumption that you had to be one of the humans who was actually born. This was then dubbed the Self-Sampling Assumption (SSA), namely
  All other things equal, an observer should reason as if they are randomly selected from the set of all actually existent observers (past, present and future) in their reference class.
  So SSA endorses the Doomsday argument. The principled way to debunk this is the Self-Indexing Assumption (SIA), which says
  All other things equal, an observer should reason as if they are randomly selected from the set of all possible observers.
  If you apply SIA, then $P (O | L) = P (O | S)$ and hence $P (L | O) = P (O)$ . Updating on $O$ no longer does anything.
  So this is the problem where SSA gives a stupid anwer. The problem where SIA gives the stupid answer is the Presumptuous Philosopher problem: there are two theories of how large the universe is, according to one it’s $10^{9}$ times as large as it is according to the other. If you apply the SIA rule, you get that the odds for living in the small universe is $\frac{1}{1 + 10^{9}}$ (if the prior was $\frac{1}{2}$ on both).
  There is also Full Non-indexical Conditioning which is technically a different theory, and it argues differently, but it outputs the same as SIA in every case, so basically there are just the two. And that, as far as I know, is the state of the art. No-one has come up with a theory that can’t be made to look ridiculous. Stuart Armstrong has made a bunch of LW posts about this recently-ish, but he hasn’t proposed a solution, he’s pointed out that existing theories are problematic. This one, for example.
  I’ve genuinely spent a lot of time thinking really hard about this stuff, and my conclusion is that the “reason as if you’re randomly selected from a set of observers” thing is the key problem here. I think that’s the reason why this still hasn’t been worked out. It’s just not the right way to look at it. I think the relevant variable which everyone is missing is that there are two fundamentally different kinds of uncertainty, and if you structure your theory around that, everything works out. And I think I do have a theory where everything works out. It doesn’t update on Doomsday and it doesn’t say the large universe is $10^{9}$ times as likely as the small one. It doesn’t give a crazy answer anywhere. And it does it all based on simple principles.
  Does that answer the question? It’s possible that I should have started the sequence with a post that states the problem; like I just assumed everyone would know the problem without ever thinking about whether that’s actually the case.
  - clone of saturn 14 Nov 2019 1:24 UTC
    2 points
    Parent
    Could you explain why the Doomsday argument answer seems absurd, or why I don’t have to be a human who was actually born?
  - Gordon Seidoh Worley 13 Nov 2019 22:50 UTC
    2 points
    Parent
    I think so, thanks.
Thelo 13 Nov 2019 21:39 UTC
2 points
“The experiment being repeated sufficiently often might be considered a reasonably mild restriction; in particular, it is a given if the universe is large enough that everything which appears once appears many times.”
Why is that a given? The set of integers is very large, but the number 3 only appears once in it.
- Rafael Harth 13 Nov 2019 22:49 UTC
  2 points
  Parent
  I think the relevant difference is that, in the set of integers, each element is strictly more complex than the previous one, but in the universe, you can probably upper bound the complexity (that’s what I’m assuming, anyway). So eventually stuff should repeat, and then anything that has a nonzero probability of appearing will appear arbitrarily often as you increase the size. For example, if there’s an upper bound to the complexity of a planet, then you can only have that many planets until you get a repeat.
  - Thelo 14 Nov 2019 17:51 UTC
    1 point
    Parent
    That doesn’t seem to follow, actually. You could easily have a very large universe that’s almost entirely empty space (which does “repeat”), plus a moderate amount of structures that only appear once each.
    And as a separate argument, plenty of processes are irreversible in practice. For instance, consider a universe where there’s a “big bang” event at the start of time, like an ordinary explosion. I’d expect that universe to never return to that original intensely-exploding state, because the results of explosions don’t go backwards in time, right?
    - Rafael Harth 14 Nov 2019 18:10 UTC
      1 point
      Parent
      That doesn’t seem to follow, actually. You could easily have a very large universe that’s almost entirely empty space (which does “repeat”), plus a moderate amount of structures that only appear once each.
      Yeah, nonemptiness was meant to be part of the assumption in the phrase you quoted.
      And as a separate argument, plenty of processes are irreversible in practice. For instance, consider a universe where there’s a “big bang” event at the start of time, like an ordinary explosion. I’d expect that universe to never return to that original intensely-exploding state, because the results of explosions don’t go backwards in time, right?
      We’re getting into territory where I don’t feel qualified to argue – although it seems like that objection only applies to some very specific things, and probably not to most Sleeping Beauty like scenarios.
  - TAG 14 Nov 2019 10:05 UTC
    1 point
    Parent
    
    the set of integers, each element is strictly more complex than the previous one
    
    Not by algorithmic complexity. The integer consisting of a million 3s in a row is quite compressible.
    - Rafael Harth 14 Nov 2019 10:09 UTC
      1 point
      Parent
      But by number of bits, which is what you need to avoid repetition.
- shirisaya 13 Nov 2019 21:48 UTC
  2 points
  Parent
  The typical answer is that this is a result of the Poincaré recurrence theorem
  - Thelo 14 Nov 2019 17:42 UTC
    2 points
    Parent
    Thanks for the mention, I had never heard of that concept before.
    I have strong reflexes of revulsion against this idea that everything must reoccur (aren’t plenty of processes irreversible in our world?), but it’s getting too off-topic for the original article, and I need to think more about this.

Insights from the randomness/​ignorance model are genuine

Insights from the randomness/ignorance model are genuine