StellaAthena comments on 0 And 1 Are Not Probabilities

StellaAthena 20 Aug 2015 8:49 UTC
15 points
This article is largely incoherent. The main justification is the abuse of an invalid transformations: y=x/(1-x) is not the bijection that he asserts it is, because it’s not a function that maps [0,1] onto R. It’s a function that maps [0,1] onto [1,\intfy] as a subset of the topological closure of R. And that’s okay, but you can’t say “well I don’t like the topological closure of R, so I’ll just use R and claim that 1 is where the problem is.”

Additionally, his discussion of log odds and such is perfectly fine, but ignores the fact that there are places where you do need to have an odds of 0:1, or a log odds of negative infinity. Probability theory stops working when you throw out 0 and 1, it’s as simple as that.

Even if you don’t want to handle tautologies or contradictions, there are other ways to get P(X)=0 or 1. The probability that a real number chosen uniformly from the real interval [0,1] is 0. It has to be. It’s a provable fact under ZFC and to decide otherwise is to say that you’re more attached to the idea of 0 and 1 not being probabilities than you are to the fact that mathematics is consistent and if you really believe that, well, there’s absolutely nothing I have to say to you.

This is one of those situations where EY just demonstrates he knows very little mathematics.
- Nikolaus Hansen 26 Dec 2019 15:20 UTC
  2 points
  Parent
  y=x/(1-x) is not the bijection that he asserts it is, [...]. It’s a function that maps [0,1] onto [1,\intfy] as a subset of the topological closure of R.
  How is that not a bijection? Specifically, a bijection between the sets $[0, 1 [\cup {1}$ and ${I R}_{\geq 0} \cup {\infty}$ , which seems exactly to be the claim EY is making.
  On a broader point, EY was not calling into question the correctness or consistency of mathematical concepts or claims but whether they have any useful meaning in reality. He was not talking about the map, he was talking about the territory and how we may improve the map to better reflect the territory.
- David_Bolin 20 Aug 2015 13:10 UTC
  1 point
  Parent
  Eliezer isn’t arguing with the mathematics of probability theory. He is saying that in the subjective sense, people don’t actually have absolute certainty. This would mean that mathematical probability theory is an imperfect formalization of people’s subjective degrees of belief. It would not necessarily mean that it is impossible in principle to come up with a better formalization.
  - Lumifer 20 Aug 2015 14:44 UTC
    1 point
    Parent
    
    Eliezer isn’t arguing with the mathematics of probability theory. He is saying that in the subjective sense, people don’t actually have absolute certainty.
    
    Errr… as I read EY’s post, he is certainly talking about the mathematics of probability (or about the formal framework in which we operate on probabilities) and not about some “subjective sense”.
    
    The claim of “people don’t actually have absolute certainty” looks iffy to me, anyway. The immediate two questions that come to mind are (1) How do you know? and (2) Not even a single human being?
    - David_Bolin 20 Aug 2015 18:50 UTC
      0 points
      Parent
      Of course if no one has absolute certainty, this very fact would be one of the things we don’t have absolute certainty about. This is entirely consistent.
    - Wes_W 20 Aug 2015 17:02 UTC
      0 points
      Parent
      If we’re asking what the author “really meant” rather than just what would be correct, it’s on record.
      
      The argument for why zero and one are not probabilities is not, “All objects which are special cases should be cast out of mathematics, so get rid of the real zero because it requires a special case in the field axioms”, it is, “ceteris paribus, can we do this without the special case?” and a bit of further intuition about how 0 and 1 are the equivalents of infinite probabilities, where doing our calculations without infinities when possible is ceteris paribus regarded as a good idea by certain sorts of mathematicians. E.T. Jaynes in “Probability Theory: The Logic of Science” shows how many probability-theoretic errors are committed by people who assume limits directly into their calculations, without first showing the finite calculation and then finally taking its limit. It is not unreasonable to wonder when we might get into trouble by using infinite odds ratios. Furthermore, real human beings do seem to often do very badly on account of claiming to be infinitely certain of things so it may be pragmatically important to be wary of them.
      
      I… can’t really recommend reading the entire thread at the link, it’s kind of flame-war-y and not very illuminating.
      - EHeller 20 Aug 2015 17:14 UTC
        4 points
        Parent
        I think the issue at hand is that 0 and 1 aren’t special cases at all, but very important for the math of probability theory to work (try and construct a probability measure where some subset doesn’t have probability 1 or 0).
        
        This is incredibly necessary for the mathematical idea of probability ,and EY seems to be confusing “are 0 and 1 probabilities relevant to Bayesian agents?” with “are 0 and 1 probabilities?” (yes, they are, unavoidably, not as a special case!).
      - Lumifer 20 Aug 2015 17:18 UTC
        −1 points
        Parent
        It seems that EY position boils down to
        
        Pragmatically speaking, the real question for people who are not AI programmers is whether it makes sense for human beings to go around declaring that they are infinitely certain of things. I think the answer is that it is far mentally healthier to go around thinking of things as having ‘tiny probabilities much larger than one over googolplex’ than to think of them being ‘impossible’.
        
        And that’s a weak claim. EY’s ideas of what is “mentally healthier” are, basically, his personal preferences. I, for example, don’t find any mental health benefits in thinking about one over googolplex probabilities.
        Wes_W 20 Aug 2015 17:27 UTC
        0 points
        Parent
        Cromwell’s Rule is not EY’s invention, and relatively uncontroversial for empirical propositions (as opposed to tautologies or the like).
        
        If you don’t accept treating probabilities as beliefs and vice versa, then this whole conversation is just a really long and unnecessarily circuitous way to say “remember that you can be wrong about stuff”.
        EHeller 20 Aug 2015 17:44 UTC
        3 points
        Parent
        The part that is new compared to Cromwell’s rule is that Yudkowsky doesn’t want to give probability 1 to logical statements (53 is a prime number).
        
        Because he doesn’t want to treat 1 as a probability, you can’t expect complete sets of events to have total probability 1, despite them being tautologies. Because he doesn’t want probability 0, how do you handle the empty set? How do you assign probabilities to statements like “A and B” where A and B are logical exclusive? (the coin lands heads AND the coin lands tails).
        
        Removing 0 and 1 from the math of probability breaks most of the standard manipulations. Again, it’s best to just say “be careful with 0 and 1 when working with odds ratios.”
        Lumifer 20 Aug 2015 17:48 UTC
        2 points
        Parent
        Nobody is saying EY invented Cromwell’s Rule, that’s not the issue.
        
        The issue is that “0 and 1 are not useful subjective certainties for a Bayesian agent” is a very different statement than “0 and 1 are not probabilities at all”.
        Wes_W 20 Aug 2015 18:05 UTC
        0 points
        Parent
        You’re right, I misread your sentence about “his personal preferences” as referring to the whole claim, rather than specifically the part about what’s “mentally healthy”. I don’t think we disagree on the object level here.
    - Gram_Stone 20 Aug 2015 15:50 UTC
      0 points
      Parent
      
      The claim of “people don’t actually have absolute certainty” looks iffy to me, anyway. The immediate two questions that come to mind are (1) How do you know? and (2) Not even a single human being?
      
      The way I view that statement is: “In our formalization, agents with absolutely certain beliefs cannot change those beliefs, we want our formalization to capture our intuitive sense of how an ideal agent would update its beliefs, a formalization with a quality of fanaticism does not capture our intuitive sense of how an ideal agent would update its beliefs, therefore we do not want a quality of fanaticism.”
      
      And what state of the world would correspond to the statement “Some people have absolute certainty.” ? Do you think that we can take some highly advanced and entirely fictional neuroimaging technology, look at a brain and meaningfully say, “There’s a belief with probability 1.” ?
      
      And on the other hand, I’m not afraid to talk about folk certainty, where the properties of an ideal mathematical system are less relevant, where everyone can remain blissfully logically uncertain to the fact that beliefs with probability 1 and 0 imply undesirable consequences in formal systems that possess them, and say things like “I believe that absolutely.” I am not afraid to say something like, “That person will not stop believing that for as long as he lives,” and mean that I predict with high confidence that that person will not stop believing that for as long as he lives.
      
      And once you believe that the formalization is trying to capture our intuitive sense of an ideal agent, and decide whether or not that quality of fanaticism captures it, and decide whether or not you’re going to be a stickler about folk language, then I don’t think that any question or confusion around that claim remains.
      - Lumifer 20 Aug 2015 15:57 UTC
        0 points
        Parent
        People are not “ideal agents”. If you specifically construct your formalization to fit your ideas of what an ideal agent should and should not be able to do, this formalization will be a poor fit to actual, live human beings.
        
        So either you make a system for ideal agents—in which case you’ll still run into some problems because, as has been pointed out upthread, standard probability math stops working if you disallow zeros and ones—or you make a system which is applicable to our imperfect world with imperfect humans.
        Gram_Stone 20 Aug 2015 21:59 UTC
        1 point
        Parent
        I don’t see why both aren’t useful. If you want a descriptive model instead of a normative one, try prospect theory.
        
        I just don’t see this article as an axiom that says probabilities of 0 and 1 aren’t allowed in probability theory. I see it as a warning not to put 0s and 1s in your AI’s prior. You’re not changing the math so much as picking good priors.
    - Bound_up 20 Aug 2015 15:23 UTC
      0 points
      Parent
      I think he’s just acknowledging the minute(?) possibility that our apparently flawless reasoning could have a blind spot. We could be in a Matrix, or have something tampering with our minds, etcetera, such that the implied assertion:
      
      If this appears absolutely certain to me
      
      Then it must be true
      
      is indefensible.
      - Lumifer 20 Aug 2015 15:43 UTC
        1 point
        Parent
        There are two different things.
        
        David_Bolin said (emphasis mine): “He is saying that in the subjective sense, people don’t actually have absolute certainty.” I am interpreting this as “people never subjectively feel they have absolute certainty about something” which I don’t think is true.
        
        You are saying that from an external (“objective”) point of view, people can not (or should not) be absolutely sure that their beliefs/conclusions/maps are true. This I easily agree with.
        David_Bolin 20 Aug 2015 19:08 UTC
        0 points
        Parent
        It should probably be defined by calibration: do some people have a type of belief where they are always right?
        Lumifer 20 Aug 2015 19:36 UTC
        0 points
        Parent
        Self-referential and anthropic things would probably qualify, e.g. “I believe I exist”.
        StellaAthena 20 Aug 2015 20:33 UTC
        −1 points
        Parent
        You can phrase statements of logical deduction such that they have no premises and only conclusions. If we let S be the set of logical principles under which our logical system operates and T be some sentence that entails Y, then S AND T implies Y is something that I have absolute certainty in, even if this world is an illusion, because the premise of the implication contains all the rules necessary to derive the result.
        
        A less formal example of this would be the sentence: If the rules of logic as I know them hold and the axioms of mathematics are true, then it is the case that 2+2=4
- Regex 20 Aug 2015 9:35 UTC
  1 point
  Parent
  As someone who doesn’t know much beyond basic statistics, in what way are 0 or 1 probabilities? Isn’t it just axiomatic truth at that point? In that sense saying zero and one are probabilities is just saying ‘certain’ or ‘impossible’ as far as I understand it. Situations where an event will definitely or definitely not occur doesn’t seem to be consistent with the idea of randomness which I’ve understood probability to revolve around.
  
  I suppose the alternative would be that we’d have to assume every mathematical proof has infinite evidence if we wanted to get anywhere productive- after all axioms are assumed to be true. It doesn’t make much sense to need evidence in that scenario- except perhaps the probability of error and mistake? That isn’t particularly calculable and would actually change from person to person.
  
  Using one and zero makes sense to me as a matter of assumed or proven truths, but I’m still unsure how that makes it a probability.
  - StellaAthena 20 Aug 2015 20:14 UTC
    1 point
    Parent
    Formally, probability is defined via areas. The basic idea is that the probability of picking an element from a set A out of a set B is the ratio of the areas of A to B, where “area” can be defined not only for things like squares but also things like lines, or actually almost every* subset of R. So, lets say you want to randomly select a real number from the interval [0,1] and want to know the odds it falls in a set, S. The area of [0,1] is 1, so the answer is just the area of S.
    
    If S={0}, then S has area zero. If S=[0,1), then S has area 1. Not only are both of these theoretical possibilities, they are practical ones too. There are real world examples of probability zero events (the only one that comes to mind involves QM though so I don’t want to bother with the details).
    
    Now, notice that this isn’t the same thing as “impossible”. Instead, it means more like “it won’t happen I promise even by the time the universe ends”. The way I tend to think about probability zero events is that they are so unlikely they are beyond the reach of the principle that as the number of trials increases, events become expected. For any nonzero probability, there is a number of trials, n, such that once you do it n times the expected value becomes greater than 1. That’s not the case with probability zero events. Probability 1 events can then be thought of as the negation of probability 0 events.
    
    *not actually “almost every” in a formal sense, but “almost any” in a “unless you go try to build a set that you can’t measure it probably has a well defined area” sense
    - Regex 21 Aug 2015 8:24 UTC
      1 point
      Parent
      That seems a solid enough explanation, but how can something of probability zero have a chance to occur? How then do you represent an impossible outcome? It seems like otherwise ‘zero’ is equivalent to ‘absurdly low’. That doesn’t quite jive with my understanding.
      - StellaAthena 21 Aug 2015 21:37 UTC
        2 points
        Parent
        Impossible things also have a probability of zero. I totally understand that this seems a bit unintuitive, and the underlying structure (which includes things like infinities of different sizes) is generally pretty unintuitive at first. Which is kinda just saying “sorry, I can’t explain the intuition,” which is unfortunately true.
        Regex 22 Aug 2015 14:47 UTC
        0 points
        Parent
        I’m just going to think of it as taking the limit as evidence approaches infinity. Because a probability next to zero and zero are identical, zero then is a probability?
      - Stephen_Cole 22 Aug 2015 0:07 UTC
        1 point
        Parent
        I think one of the clearest expositions on these issues is ET Jaynes. The first three chapters (which is some of the relevant part) can be found at http://bayes.wustl.edu/etj/prob/book.pdf.
        Regex 22 Aug 2015 14:39 UTC
        0 points
        Parent
        “Not Found
        
        The requested URL /etj/prob/book.pdf. was not found on this server.”
        arundelo 22 Aug 2015 14:59 UTC
        2 points
        Parent
        Fixed Jaynes link (no trailing period).
        Stephen_Cole 22 Aug 2015 15:17 UTC
        0 points
        Parent
        Oops. Thanks for the fix!
        Regex 22 Aug 2015 15:01 UTC
        0 points
        Parent
        Ah. Thanks!
  - Epictetus 20 Aug 2015 14:30 UTC
    1 point
    Parent
    
    Situations where an event will definitely or definitely not occur doesn’t seem to be consistent with the idea of randomness which I’ve understood probability to revolve around.
    
    “Event” is a very broad notion. Let’s say, for example, that I roll two dice. The sample space is just a collection of pairs (a, b) where “a” is what die 1 shows and “b” is what die 2 shows. An event is any sub-collection of the sample space. So, the event that the numbers sum to 7 is the collection of all such pairs where a + b = 7. The probability of this event is simply the fraction of the sample space it occupies.
    
    If I rolled eight dice, then they’ll never sum to seven and I say that that event occurs with probability 0. If I secretly rolled an unknown number of dice, you could reasonably ask me the probability that they sum to seven. If I answer “0”, that just means that I rolled more than one and fewer than eight dice. It doesn’t make the process less random nor the question less reasonable.
    
    If you treat an event as some question you can ask about the result of a random process, then 1 and 0 make a lot more sense as probabilities.
    
    For the mathematical theory of probability, there are plenty of technical reasons why you want to retain 1 and 0 as probabilities (and once you get into continuous distributions, it turns out that probability 1 just means “almost certain”).
    - Regex 21 Aug 2015 8:36 UTC
      0 points
      Parent
      This is what I meant by something being a proven truth- within the rules set one can find outcomes which are axiomatically impossible or necessary. The process itself may be random, but calling it random when something impossible didn’t happen seems odd to me. The very idea that 1 may be not-quite-certain is more than a little baffling, and I suspect is the heart of the issue.
      - Epictetus 21 Aug 2015 14:01 UTC
        1 point
        Parent
        
        The very idea that 1 may be not-quite-certain is more than a little baffling, and I suspect is the heart of the issue.
        
        If 1 isn’t quite certain then neither is 0 (if something happens with probability 1, then the probability of it not happening is 0). It’s one of those things that pops up when dealing with infinity.
        
        It’s best illustrated with an example. Let’s say we play a game where we flip a coin and I pay you $1 if it’s heads and you pay me $1 if it’s tails. With probability 1, one of us will eventually go broke (see Gambler’s ruin). It’s easy think of a sequence of coin flips where this never happens; for example, if heads and tails alternated. The theory holds that such a sequence occurs with probability 0. Yet this does not make it impossible.
        
        It can be thought of as the result of a limiting process. If I looked at sequences of N of coin flips, counted the ones where no one went broke and divided this by the total number of possible sequences, then as I let N go to infinity this ratio would go to zero. This event occupies an region with area 0 in the sample space.
        Regex 22 Aug 2015 15:01 UTC
        0 points
        Parent
        If the limit converges then it can hit 0 or 1. Got it. Thank you.