Imagine a universe with an infinite number of identical rooms, each of which contains a single human. Each room is numbered outside: 1, 2, 3, …
The probability of you being in the first 100 rooms is 0 - if you ever have to make an expected utility calculation, you shouldn’t even consider that chance. On the other hand, it is definitely possible in the sense that some people are in those first 100 rooms.
If you consider the probability of you being in room Q, this probability is also 0. However, it (intuitively) feels “more” impossible.
I don’t really think this line of thought leads anywhere interesting, but it definitely violated my intuitions.
There is no such thing as a uniform probability distribution over a countably infinite event space (see Toggle’s comment). The distribution you’re assuming in your example doesn’t exist.
Maybe a better example for your purposes would be picking a random real number between 0 and 1 (this does correspond to a possible distribution, assuming the axiom of choice is true). The probability of the number being rational is 0, the probability of it being greater than 2 is also 0, yet the latter seems “more impossible” than the former.
Of course, this assumes that “probability 0” entails “impossible”. I don’t think it does. The probability of picking a rational number may be 0, but it doesn’t seem impossible. And then there’s the issue of whether the experiment itself is possible. You certainly couldn’t construct an algorithm to perform it.
Of course, this assumes that “probability 0” entails “impossible”. I don’t think it does. The probability of picking a rational number may be 0, but it doesn’t seem impossible.
Given uncountable sample space, P(A)=0 does not necessarily imply that A is impossible. A is impossible iff the intersection of A and sample space is empty.
Intuitively speaking, one could say that P(A)=0 means that A resembles “a miracle” in a sense that if we perform n independent experiments, we still cannot increase the probability that A will happen at least once even if we increase n. Whereas if P(B)>0, then by increasing number of independent experiments n we can make probability of B happening at least once approach 1.
I (now) understand the problem with using a uniform probability distribution over a countably infinite event space. However, I’m kind of confused when you say that the example doesn’t exist. Surely, its not logically impossible for such an infinite universe to exist. Do you mean that probability theory isn’t expressive enough to describe it?
When I say the probability distribution doesn’t exist, I’m not talking about the possibility of the world you described. I’m talking about the coherence of the belief state you described. When you say “The probability of you being in the first 100 rooms is 0”, it’s a claim about a belief state, not about the mind-independent world. The world just has a bunch of rooms with people in them. A probability distribution isn’t an additional piece of ontological furniture.
If you buy the Cox/Jaynes argument that your beliefs must adhere to the probability calculus to be rationally coherent, then assigning probability 0 to being in any particular room is not a coherent set of beliefs. I wouldn’t say this is a case of probability theory not being “expressive enough”. Maybe you want to argue that the particular belief state you described (“Being in any room is equally likely”) is clearly rational, in which case you would be rejecting the idea that adherence to the Kolmogorov axioms is a criterion for rationality. But do you think it is clearly rational? On what grounds?
(Incidentally, I actually do think there are issues with the LW orthodoxy that probability theory limns rationality, but that’s a discussion for another day.)
From a decision-theory perspective, I should essentially just ignore the possibility that I’m in the first 100 rooms—right?
Similarly, if I’m born in a universe with infinite such rooms and someone tells me to guess whether my room is a multiple of 10 or not. If I guess correctly, I get a dollar; otherwise I lose a dollar.
Theoretically there are as many multiples of 10 as not (both being equinumerous to the integers), but if we define rationality as the “art of winning”, then shouldn’t I guess “not in a multiple of 10″? I admit that my intuition may be broken here—maybe it just truly doesn’t matter which you guess—after all its not like we can sample a bunch of people born into this world without some sampling function. However, doesn’t the question still remain: what would a rational being do?
From a decision-theory perspective, I should essentially just ignore the possibility that I’m in the first 100 rooms—right?
Well, what do you mean by “essentially ignore”? If you’re asking if I should assign substantial credence to the possibility, then yeah, I’d agree. If you’re asking whether I should assign literally zero credence to the possibility, so that there are no possible odds—no matter how ridiculously skewed—I would accept to bet that I am in one of those rooms… well, now I’m no longer sure. I don’t exactly know how to go about setting my credences in the world you describe, but I’m pretty sure assigning 0 probability to every single room isn’t it.
Consider this: Let’s say you’re born in this universe. A short while after you’re born, you discover a note in your room saying, “This is room number 37”. Do you believe you should update your belief set to favor the hypothesis that you’re in room 37 over any other number? If you do, it implies that your prior for the belief that you’re in one of the first 100 rooms could not have been 0.
(But. on the other hand, if you think you should update in favor of being in room x when you encounter a note saying “You are in room x”, no matter what the value of x, then you aren’t probabilistically coherent. So ultimately, I don’t think intuition-mongering is very helpful in these exotic scenarios. Consider my room 37 example as an attempt to deconstruct your initial intuition, rather than as an attempt to replace it with some other intuition.)
Theoretically there are as many multiples of 10 as not (both being equinumerous to the integers), but if we define rationality as the “art of winning”, then shouldn’t I guess “not in a multiple of 10″?
Perhaps, but reproducing this result doesn’t require that we consider every room equally likely. For instance, a distribution that attaches a probability of 2^(-n) to being in room n will also tell you to guess that you’re not in a multiple of 10. And it has the added advantage of being a possible distribution. It has the apparent disadvantage of arbitrarily privileging smaller numbered rooms, but in the kind of situation you describe, some such arbitrary privileging is unavoidable if you want your beliefs to respect the Kolmogorov axioms.
What I mean by “essentially ignore” is that if you are (for instance) offered the following bet you would probably accept: “If you are in the first 100 rooms, I kill you. Otherwise, I give you a penny.”
I see your point regarding the fact that updating using Bayes’ theorem implies your prior wasn’t 0 to begin with.
I guess my question is now whether there are any extended versions of probability theory. For instance, Kolmogorov probability reverts to Aristotelian logic for the extremes P=1 and P=0. Is there a system of though that revers to probability theory for finite worlds but is able to handle infinite worlds without privileging certain (small) numbers?
I will admit that I’m not even sure saying that guessing “not a multiple of 10” follows the art of winning, as you can’t sample from an infinite set of rooms either in traditional probability/statistics without some kind of sampling function that biases certain numbers. At best we can say that whatever finite integer N you choose as N goes to infinity the best strategy is to pick “multiple of 10″. By induction we can prove that guessing “not a multiple of 10” is true for any finite number of rooms but alas infinity remains beyond this.
Your math has some problems. Note that, if p(X=x) = 0 for all x, then the sum over X is also zero. But if you’re in a room, then by definition you have sampled from the set of rooms- the probability of selecting a room is one. Since the probability of selecting ‘any room from the set of rooms’ is both zero and one, we have established a contradiction, so the problem is ill-posed.
As others have pointed out, there is no uniform probability distribution on a countable set. There are various generalisations of probability that drop or weaken the axiom of countable additivity, which have their uses, but one statistician’s conclusion is that you lose too many useful properties. On the other hand, writing a blog post to describe something as a lost cause suggests that it still has adherents. Googling /”finite additivity” probability/ turns up various attempts to drop countable additivity.
Another way of avoiding the axiom is to reject all infinities. There are then no countable sets to be countably additive over. This throws out almost all of current mathematics, and has attracted few believers.
In some computations involving probabilities, the axiom that the measure over the whole space is 1 plays no role. A notable example is the calculation of posterior probabilities from priors and data by Bayes’ Theorem:
The total measure of the prior cancels out of the numerator and denominator. This allows the use of “improper” priors that can have an infinite total measure, such as the one that assigns measure 1 to every integer and infinite measure to the set of all integers.
There can be a uniform probability distribution over an uncountable set, because there is no requirement for a probability distribution to be uncountably additive. Every sample drawn from the uniform distribution over the unit interval has a probability 0 of being drawn. This is just one of those things that one comes to understand by getting used to it, like square roots of −1, 0.999...=1, non-euclidean geometry, and so on.
As I recall, Teddy Seidenfeld is a fan of finite additivity.
Do you know why?
The recent thread on optional stopping and Bayes led me to this paper, which I see Seidenfeld is one of the authors of, which argues that countable additivity has bad consequences. But these consequences are a result of improper handling of limits, as Jaynes sets forth in his chapter 15. Seidenfeld and his coauthors go to great lengths (also here) exploring the negative consequences of finite additivity for Bayesian reasoning. They see this as a problem for Bayesian reasoning rather than for finite additivity. But I have not seen their motivation.
If you’re going to do probability on infinite spaces at all, finite additivity just seems to me to be an obviously wrong concept.
ETA: Here’s another paper by Seidenfeld, whose title does rather suggest that it is going to argue against finite additivity, but whose closing words decline to resolve the matter.
They have already been pointed to you: either extend PT to use some kind of measure (Jaynes’ solution), ore use only distributions that have a definite limit when extended to the infinite, or use infinitely small quantities.
There are different levels of impossible.
Imagine a universe with an infinite number of identical rooms, each of which contains a single human. Each room is numbered outside: 1, 2, 3, …
The probability of you being in the first 100 rooms is 0 - if you ever have to make an expected utility calculation, you shouldn’t even consider that chance. On the other hand, it is definitely possible in the sense that some people are in those first 100 rooms.
If you consider the probability of you being in room Q, this probability is also 0. However, it (intuitively) feels “more” impossible.
I don’t really think this line of thought leads anywhere interesting, but it definitely violated my intuitions.
There is no such thing as a uniform probability distribution over a countably infinite event space (see Toggle’s comment). The distribution you’re assuming in your example doesn’t exist.
Maybe a better example for your purposes would be picking a random real number between 0 and 1 (this does correspond to a possible distribution, assuming the axiom of choice is true). The probability of the number being rational is 0, the probability of it being greater than 2 is also 0, yet the latter seems “more impossible” than the former.
Of course, this assumes that “probability 0” entails “impossible”. I don’t think it does. The probability of picking a rational number may be 0, but it doesn’t seem impossible. And then there’s the issue of whether the experiment itself is possible. You certainly couldn’t construct an algorithm to perform it.
Given uncountable sample space, P(A)=0 does not necessarily imply that A is impossible. A is impossible iff the intersection of A and sample space is empty.
Intuitively speaking, one could say that P(A)=0 means that A resembles “a miracle” in a sense that if we perform n independent experiments, we still cannot increase the probability that A will happen at least once even if we increase n. Whereas if P(B)>0, then by increasing number of independent experiments n we can make probability of B happening at least once approach 1.
I (now) understand the problem with using a uniform probability distribution over a countably infinite event space. However, I’m kind of confused when you say that the example doesn’t exist. Surely, its not logically impossible for such an infinite universe to exist. Do you mean that probability theory isn’t expressive enough to describe it?
When I say the probability distribution doesn’t exist, I’m not talking about the possibility of the world you described. I’m talking about the coherence of the belief state you described. When you say “The probability of you being in the first 100 rooms is 0”, it’s a claim about a belief state, not about the mind-independent world. The world just has a bunch of rooms with people in them. A probability distribution isn’t an additional piece of ontological furniture.
If you buy the Cox/Jaynes argument that your beliefs must adhere to the probability calculus to be rationally coherent, then assigning probability 0 to being in any particular room is not a coherent set of beliefs. I wouldn’t say this is a case of probability theory not being “expressive enough”. Maybe you want to argue that the particular belief state you described (“Being in any room is equally likely”) is clearly rational, in which case you would be rejecting the idea that adherence to the Kolmogorov axioms is a criterion for rationality. But do you think it is clearly rational? On what grounds?
(Incidentally, I actually do think there are issues with the LW orthodoxy that probability theory limns rationality, but that’s a discussion for another day.)
From a decision-theory perspective, I should essentially just ignore the possibility that I’m in the first 100 rooms—right?
Similarly, if I’m born in a universe with infinite such rooms and someone tells me to guess whether my room is a multiple of 10 or not. If I guess correctly, I get a dollar; otherwise I lose a dollar.
Theoretically there are as many multiples of 10 as not (both being equinumerous to the integers), but if we define rationality as the “art of winning”, then shouldn’t I guess “not in a multiple of 10″? I admit that my intuition may be broken here—maybe it just truly doesn’t matter which you guess—after all its not like we can sample a bunch of people born into this world without some sampling function. However, doesn’t the question still remain: what would a rational being do?
Well, what do you mean by “essentially ignore”? If you’re asking if I should assign substantial credence to the possibility, then yeah, I’d agree. If you’re asking whether I should assign literally zero credence to the possibility, so that there are no possible odds—no matter how ridiculously skewed—I would accept to bet that I am in one of those rooms… well, now I’m no longer sure. I don’t exactly know how to go about setting my credences in the world you describe, but I’m pretty sure assigning 0 probability to every single room isn’t it.
Consider this: Let’s say you’re born in this universe. A short while after you’re born, you discover a note in your room saying, “This is room number 37”. Do you believe you should update your belief set to favor the hypothesis that you’re in room 37 over any other number? If you do, it implies that your prior for the belief that you’re in one of the first 100 rooms could not have been 0.
(But. on the other hand, if you think you should update in favor of being in room x when you encounter a note saying “You are in room x”, no matter what the value of x, then you aren’t probabilistically coherent. So ultimately, I don’t think intuition-mongering is very helpful in these exotic scenarios. Consider my room 37 example as an attempt to deconstruct your initial intuition, rather than as an attempt to replace it with some other intuition.)
Perhaps, but reproducing this result doesn’t require that we consider every room equally likely. For instance, a distribution that attaches a probability of 2^(-n) to being in room n will also tell you to guess that you’re not in a multiple of 10. And it has the added advantage of being a possible distribution. It has the apparent disadvantage of arbitrarily privileging smaller numbered rooms, but in the kind of situation you describe, some such arbitrary privileging is unavoidable if you want your beliefs to respect the Kolmogorov axioms.
What I mean by “essentially ignore” is that if you are (for instance) offered the following bet you would probably accept: “If you are in the first 100 rooms, I kill you. Otherwise, I give you a penny.”
I see your point regarding the fact that updating using Bayes’ theorem implies your prior wasn’t 0 to begin with.
I guess my question is now whether there are any extended versions of probability theory. For instance, Kolmogorov probability reverts to Aristotelian logic for the extremes P=1 and P=0. Is there a system of though that revers to probability theory for finite worlds but is able to handle infinite worlds without privileging certain (small) numbers?
I will admit that I’m not even sure saying that guessing “not a multiple of 10” follows the art of winning, as you can’t sample from an infinite set of rooms either in traditional probability/statistics without some kind of sampling function that biases certain numbers. At best we can say that whatever finite integer N you choose as N goes to infinity the best strategy is to pick “multiple of 10″. By induction we can prove that guessing “not a multiple of 10” is true for any finite number of rooms but alas infinity remains beyond this.
Your math has some problems. Note that, if p(X=x) = 0 for all x, then the sum over X is also zero. But if you’re in a room, then by definition you have sampled from the set of rooms- the probability of selecting a room is one. Since the probability of selecting ‘any room from the set of rooms’ is both zero and one, we have established a contradiction, so the problem is ill-posed.
As others have pointed out, there is no uniform probability distribution on a countable set. There are various generalisations of probability that drop or weaken the axiom of countable additivity, which have their uses, but one statistician’s conclusion is that you lose too many useful properties. On the other hand, writing a blog post to describe something as a lost cause suggests that it still has adherents. Googling /”finite additivity” probability/ turns up various attempts to drop countable additivity.
Another way of avoiding the axiom is to reject all infinities. There are then no countable sets to be countably additive over. This throws out almost all of current mathematics, and has attracted few believers.
In some computations involving probabilities, the axiom that the measure over the whole space is 1 plays no role. A notable example is the calculation of posterior probabilities from priors and data by Bayes’ Theorem:
Posterior(H|D) = P(D|H) Prior(H) / Sum_H’ ( P(D|H’) Prior(H’) )
(H, H’ = hypothesis, D = data.)
The total measure of the prior cancels out of the numerator and denominator. This allows the use of “improper” priors that can have an infinite total measure, such as the one that assigns measure 1 to every integer and infinite measure to the set of all integers.
There can be a uniform probability distribution over an uncountable set, because there is no requirement for a probability distribution to be uncountably additive. Every sample drawn from the uniform distribution over the unit interval has a probability 0 of being drawn. This is just one of those things that one comes to understand by getting used to it, like square roots of −1, 0.999...=1, non-euclidean geometry, and so on.
As I recall, Teddy Seidenfeld is a fan of finite additivity. He does decision theory work, also.
Do you know why?
The recent thread on optional stopping and Bayes led me to this paper, which I see Seidenfeld is one of the authors of, which argues that countable additivity has bad consequences. But these consequences are a result of improper handling of limits, as Jaynes sets forth in his chapter 15. Seidenfeld and his coauthors go to great lengths (also here) exploring the negative consequences of finite additivity for Bayesian reasoning. They see this as a problem for Bayesian reasoning rather than for finite additivity. But I have not seen their motivation.
If you’re going to do probability on infinite spaces at all, finite additivity just seems to me to be an obviously wrong concept.
ETA: Here’s another paper by Seidenfeld, whose title does rather suggest that it is going to argue against finite additivity, but whose closing words decline to resolve the matter.
I opine that you are equivocating between “tends to zero as N tends to infinity” and “is zero”. This is usually a very bad idea.
Measure theory) is a tricky subject. Also consider https://twitter.com/ZachWeiner/status/625711339520954368 .
Could you recommend a good source from which to learn measure theory?
This is an old problem in probability theory, and there are different solutions.
PT is developed first in finite model, so it’s natural that its extension to infinite models can be done in a few different ways.
Could you point me to some solutions?
They have already been pointed to you: either extend PT to use some kind of measure (Jaynes’ solution), ore use only distributions that have a definite limit when extended to the infinite, or use infinitely small quantities.