[SEQ RERUN] 0 And 1 Are Not Probabilities
Today’s post, 0 And 1 Are Not Probabilities was originally published on 10 January 2008. A summary (taken from the LW wiki):
In the ordinary way of writing probabilities, 0 and 1 both seem like entirely reachable quantities. But when you transform probabilities into odds ratios, or log-odds, you realize that in order to get a proposition to probability 1 would require an infinite amount of evidence.
Discuss the post here (rather than in the comments to the original post).
This post is part of the Rerunning the Sequences series, where we’ll be going through Eliezer Yudkowsky’s old posts in order so that people who are interested can (re-)read and discuss them. The previous post was Infinite Certainty, and you can use the sequence_reruns tag or rss feed to follow the rest of the series.
Sequence reruns are a community-driven effort. You can participate by re-reading the sequence post, discussing it here, posting the next day’s sequence reruns post, or summarizing forthcoming articles on the wiki. Go here for more details, or to have meta discussions about the Rerunning the Sequences series.
- [SEQ RERUN] Beautiful Math by 21 Dec 2011 7:15 UTC; 11 points) (
- 3 Jan 2012 22:44 UTC; 0 points) 's comment on New Year’s Prediction Thread (2012) by (
Well, I’ve a problem with attributing a non-1 probability of the laws of probabilities. Not that I couldn’t conceive them to be false—but that if they are false, any reasoning done on probabilities is wrong anyway.
Or said otherwise : P(A|A) = 1 is true by definition. And I claim that when you write P(A) and apply probability theorems on it, you’re in fact manipulating P(A|the laws of probabilities). So P(an axiom of probability theory) is in fact P(an axiom of probability theory|the laws of probabilities) = 1.
For theorems, you can say that P(Bayes Theorem) is not 1 because even if the axioms of probability theory are true, we may be wrong in proving Bayes Theorem from it. But as soon as you actually use Bayes Theorem to obtain a P(A) then you obtain in fact a P(A|Bayes Theorem).
Successful use would count as evidence for the laws of probabilities providing “good” values right? So if we use these laws quite a bit and they always work, we might have P(Laws of Probability do what we think they do) = .99999 We could discount our output using this. We could also be more constructive and discount based on the complexity of the derivation using the principle “long proofs are less likely to be correct” in the following way: Each derivation can be done in terms of combinations of various sub-derivations so we could get probability bounds for new, longer derivations from our priors over other derivations from which it is assembled. (derivations being the general form of the computation rather than the value specific one).
ETA: Wait, were you sort of diagonalizing on Bayes Theorem because we need to use that to update P(Bayes Theorem)? If so I might have misread you.
I think this is kind of funny considering that the second axiom of probability states that an elementary event has probability one. It’s just a simple way to define the system, like how the axioms of euclidean geometry are simpler if you have a point at infinity. It doesn’t necessarily mean anything. I just find it kind of funny.
I believe that a part of the post’s point is that the entire sample space is hard to find in most real-life cases. From the post:
EDIT: Another example, this time from the Martin Gardner’s excellent book, Mathematical Games :
Jaynes didn’t like Kolmogorov’s axioms, and I expect Eliezer would agree. I remember he mentioned somewhere in the sequences that he thought probability could be axiomatized without reference to probabilities of 0 or 1, but it wouldn’t have much practical use to do so.
Jaynes definitely believed in 0 and 1 probabilities. In Probability Theory: The Logic of Science, equation (2.71), he gives
P(B | A, (A implies B)) = 1
P(A | not B, (A implies B)) = 0
Remember that probabilities are relative to a state of information. If X is a state of information from which we can infer A via deductive logic, then P(A | X) = 1 necessarily. Some common cases of this are
A is a tautology,
we are doing some sort of case analysis and X represents one of the cases being considered, or
we are investigating the consequences of some hypothesis and X represents the hypothesis.
However, Eliezer’s fundamental point is correct when we turn to the states of information of rational beings and propositions that are not tautologies or theorems. If a person’s state of information is X, and P(A | X) = 1, then no amount of contrary evidence can dissuade that person of A. This does not sound like rational behavior, unless A is necessarily true (in the mathematical sense of being a tautology or theorem).
I did not say that he didn’t. I said that he didn’t like Kolmogorov’s axioms. You can also derive Bayes’ rule from Kolmogorov’s axioms; that doesn’t mean Jayes didn’t believe in Bayes’ rule.
I don’t know what one thing it means to not like axioms. So I’m not sure what you mean.
I meant that he didn’t think they were the best way to describe probability. IIRC, he thought that they didn’t make it clear why the structure they described is the right way to handle uncertainty. He also may have said that they allow you to talk about certain objects that don’t really correspond to any epistemological concepts. You can find his criticism in one of the appendices to Probability Theory: the Logic of Science.
I think this idea is overrated by LWers. It’s true that if you make an argument that P(A) = 1 then it does not follow that P(A) = 1 because you might be wrong. There is nothing really special about 1 here: it’s also true that if you make an argument that P(A) = 2⁄3 then it does not follow that P(A) = 2⁄3 because you might be wrong. The only reason to even mention it is that it’s a common special case: many arguments, in particular most mathematical proofs, do not involve probability, and so their output consists of P(A) = 1 or P(A) = 0; also, mathematical proofs tend to be correct with a very high probability, so P(A|proof of A) is very close to 1.
So does it follow that we should avoid probabilities of 0 and 1 in our reasoning? I don’t think it does, and I think that doing so becomes more and more pointless as your arguments become more and more mathematically rigorous. The concept of 0 and 1 probabilities are just too useful to discard just because someone might get confused. Sure, if you’re manually setting priors for your Bayesian AI, you should be aware that giving a prior of 0 or 1 for a statement means it will never update. But to how many of us is that relevant?
A similar idea is much better explained in Confidence Levels Inside and Outside an Argument. In my opinion, any part of this post that is not also covered there is not worth reading.