Those things aren’t opposed to one another. It is perfectly possible for me to choose an integer with such a probability distribution: 1⁄2 for the number 1; 1⁄4 for the number 2; 1⁄8 for the number 3; and so on. The probabilities will sum to 1, there will be an infinite number of probabilities, and none will be zero. It is perfectly reasonable.
Also, it will not fundamentally change the theorem if some of the probabilities are allowed to be zero.
Those things aren’t opposed to one another. It is perfectly possible for me to choose an integer with such a probability distribution: 1⁄2 for the number 1; 1⁄4 for the number 2; 1⁄8 for the number 3; and so on. The probabilities will sum to 1, there will be an infinite number of probabilities, and none will be zero. It is perfectly reasonable.
That is a good point. My objection is clearly incompletely expressed. It is my impression that your reasoning relied on forcing the probability distribution to follow tricks like the above in order to rule out “The distribution is just uniform. That’s it. They’re all the freaking same probability.” That is, I expected you forbid a distribution made up of 1/infinity while insisting that the distribution is infinitely large by saying “but zero isn’t a probability”.
Basically, if you say there are infinitely many options then I say each option is allowed to be 1/infinity and that infinity times 1 divided by infinity is 1. Yes, it looks like adding together lots of zeros to get one, but hey, if you mess with infinity then expect that sort of thing. I’m sure there are special mathematical ways to refer to the “1/infinity” kind of zero but I just say that if you can have infinitely many things you can have infinitely many infinitely small identical things.
Also, it will not fundamentally change the theorem if some of the probabilities are allowed to be zero.
Really? That is interesting. That’s an even stronger claim than the one you made in the post. You are confident that it holds even when you remove the “since the probabilities can not be zero” part?
Yes, it looks like adding together lots of zeros to get one, but hey, if you mess with infinity then expect that sort of thing. I’m sure there are special mathematical ways to refer to the “1/infinity” kind of zero but I just say that if you can have infinitely many things you can have infinitely many infinitely small identical things.
You can have a uniform distribution over a continuous space, but not over integers. The reason is that probability measures are supposed to be countably additive, so if each integer has weight 0, they all together must have weight 0. To make a uniform distribution over infinitely many hypotheses, you need them to be an uncountable set (e.g. parameterize them somehow by real numbers between 0 and 1). This is exactly what happens when your hypotheses talk about an unknown real parameter: each individual value has probability 0, but an interval of values has positive probability that you obtain by integrating the prior over that interval. If you can extend this idea to hypotheses indexed by complexity, this will be a Major Breakthrough, but I think it’s very hard.
You can have a uniform distribution over a continuous space, but not over integers.
I think it’s worth adding that while you can have a uniform distribution over some continuous spaces, like the real numbers between 0 and 1, you can’t have a uniform distribution over all real numbers.
How can you do that? I thought 0-1 had the same cardinality as all reals, and that the cardinality of all reals is too high to have a uniform distribution.
I’m completely flummoxed by the level of discussion here in the comments to Unknowns’s post. When I wrote a post on logic and most commenters confused truth and provability, that was understandable because not everyone can be expected to know mathematical logic. But here we see people who don’t know how to sum or reorder infinite series, don’t know what a uniform distribution is, and talk about “the 1/infinity kind of zero”. This is a rude wakeup call. If we want to discuss issues like Bayesianism, quantum mechanics or decision theory, we need to take every chance to fill the gaps in our understanding of math.
To answer your question: [0,1] does have the same cardinality as all reals, so in the set-theoretic sense they’re equivalent. But they are more than just sets: they come equipped with an additional structure, “measure”). A probability distribution can only be defined as uniform with regard to some measure. The canonical measure of the whole [0,1] is 1, so you can set up a uniform distribution that says the probability of each measurable subset is the measure of that subset (alternatively, the integral of the constant function f(x)=1 over that subset). But the canonical measure of the whole real line is infinite, so you cannot build a uniform distribution with respect to that.
After you’re comfortable with the above idea, we can add another wrinkle: even though we cannot set up a uniform distribution over all reals, we can in some situations use a uniform prior over all reals. Such things are called improper priors and rely on the lucky fact that the arithmetic of Bayesian updating doesn’t require the integral of the prior to be 1, so in well-behaved situations even “improper” prior distributions can always give rise to “proper” posterior distributions that integrate to 1.
I’m completely flummoxed by the level of discussion here in the comments to Unknowns’s post. When I wrote a post on logic and most commentors confused truth and provability, that was understandable because not everyone can be expected to know mathematical logic. But here we see people who don’t know how to sum or reorder infinite series, don’t know what a uniform distribution is, and talk about “the 1/infinity kind of zero”. This is a rude wakeup call. If we want to discuss issues like Bayesianism, quantum mechanics or decision theory, we need to take every chance to fill the gaps in our understanding of math.
Yeah, it might help to make a list of “math recommended for reading and posting on Less Wrong” Unfortunately, the entire set is likely large enough such that even many physics majors won’t have all of it (lots of physics people don’t take logic or model theory classes). At this point the list of math topics seems to include Lebesque integration, Godel’s theorems and basic model theory, basics of continuous and discrete probability spaces, and a little bit of theoretical compsci ranging over a lot of topics (both computability theory and complexity theory seem relevant). Some of the QM posts also require a bit of linear algebra to actually grok well but I suspect that anyone who meets most of the rest of the list will have that already. Am I missing any topics?
Not all LW participants need to know advanced math. Human rationality is a broad topic, and someone like Yvain or Alicorn can contribute a lot without engaging the math side of things. What I care about is the signal-to-noise ratio in math discussion threads. In other words, I’m okay with regular ignorance but hate vocal ignorance with a fiery passion.
I propose a simple heuristic: if you see others using unfamiliar math, look it up before commenting. If it relies on other concepts that you don’t understand, look them up too, and so on. Yes, it might take you days of frantic digging. It sometimes takes me days full-time, even though I have a math degree. Time spent learning more math is always better invested than time spent in novice-level discussions.
That seems valid, but part of the trouble also seems to be people thinking they understand math that they don’t after reading popularizations. I’m not sure how to deal with that other than just having those individuals read the actual math.
We need merely a norm to tell people to stop, not a magic way of explaining math faster than possible. Also, more realistic timeline for grokking (as opposed to parsing) deeper concepts is months, not days, and that’s if you are smart enough in the first place.
In getting to some of those things, there are some more basic subjects that would have to be mastered:
Algebra: giving names to unknown numerical quantities and then reasoning about them the way one would with actual numbers.
Calculus: the relationship between a rate of change and a total amount, and the basic differential equations of physics, e.g. Newtonian mechanics, the diffusion equation, etc.
If this all seems like a lot, many people spend until well into their twenties in school. Think where they would get to if all that time had been usefully spent!
Gosh, now I don’t know whether to feel bad or not for asking that question.
But I guess ‘no, it’s not just cardinality that matters but measure’ is a good answer. Is there any quick easy explanation of measure and its use in probability?
(I have yet to learn anything from Wikipedia on advanced math topics. You don’t hear bad things about Wikipedia’s math topics, but as Bjarne said of C++, complaints mean there are users.)
Don’t feel bad, you’re actually a hero. The four levels of depravity, by cousin_it:
Ask an illiterate question.
Assert an illiterate statement.
Assert and violently defend an illiterate statement.
Assert and violently defend an illiterate statement, becoming offended and switching to ad hominems when you begin to lose.
I’m not sure if other people can successfully learn math topics from Wikipedia because I’m atypical and other-optimizing is hard. Anyway, here’s how you tell whether you actually understand a math topic: you should be able to solve simple problems. For example, someone who can’t find Nash equilibria in simple 2x2 games is unqualified to talk about the Prisoner’s Dilemma, and someone who can’t solve the different Monty Hall variants is unqualified to talk about Bayesianism and priors.
Also, I second this observation of Wikipedia’s math pages not being a good learning resource. They’re pretty good as reference and for refreshers on stuff you’ve already learned, but it’s no substitute for a decent textbook and/or instructor.
Yes, it’s uncountable, so the probability of any particular real number is zero. I’m not going to go into detail by what I mean by a probability distribution here (though I’ll note that cousin it used the word “integrate”). To uniformly pick a number $x$ from 0 to 1, use an infinite sequence of coin flips to represent it in binary. Infinity is impractical, but for natural questions, like: is $x$ between 1⁄5 and 1/sqrt(2), you will almost surely need only finitely many flips (and your expected number is also finite). And the probability that the answer is yes is 1/sqrt(2)-1/5; that is the sense in which it is uniform.
You can transfer this distribution to the set of all real numbers, eg, by logit or arctan, but it won’t be uniform in the same sense. One can satisfy that uniformity by a measure which not a probability measure.
It won’t change “fundamentally” i.e. in any important way, but it will slightly change the meaning of the probability decreasing on average, because for example, the probability of all hypotheses of complexity 17 might be 0, and if this is the case then there is no greater complexity which will give a lower probability for this particular value. But in any case either the terms will sum to 1 after a finite number of values, in which case the probability will drop permanently to 0 after that, or it will not, in which case the theorem will continue to hold as applied to all the hypotheses that have some positive probability.
Just because there are infinitely many options doesn’t mean that it’s reasonable to assign a probability of 1/infinity. These options are claims about reality that you make with a language; saying that their probability is such a value is much the same as if I said that the probability that you are right in this disagreement is 1/infinity. If someone can say this in the context of my argument, he can say it any context, and this would not be standard probability theory.
Is the claim “you can have probability distributions with infinitely many options and no such distribution can be uniform” a premise or conclusion of your argument?
I don’t agree with the above but even if I did I would assert that it is trivial to reverse the order of ranking such that the arbitrary limitations on the probability distribution lead you to completely the opposite conclusion.
You can’t reverse the order, because there is a least element but no greatest element. So they can’t be swapped round.
The time taken to reverse the ordering of an ordered list depends on the size of the list. Yes, if the size is infinite you cannot reverse it. Or print it out. Or sum it. Or find the median. Infinity messes things up. It certainly doesn’t lead us to assuming:
“Things can be considered in order of complexity but not in order of simplicity (1/compexity) and therefore the most complex things must be lower probability than simple things.”
I say this is just demonstrating “things you shouldn’t do with infinity”, not proving Occams Razor.
the most complex things must be lower probability than simple things
That is not a conclusion. By the definitions given, there is no “most complex thing”; rather, for any given element you pick out that you might think is the most complex thing, there are infinitely many that are more complex.
The claim you mention is a conclusion from standard probability theory.
The order can’t be reversed, just as you can’t reverse the order of the natural numbers to make them go from high to low. Yes, you could start at a google and count down to 1, but then after that you would be back at counting up again. In the same way every possible ordering of hypotheses will sooner or later start ordering them from simpler to more complex.
Those things aren’t opposed to one another. It is perfectly possible for me to choose an integer with such a probability distribution: 1⁄2 for the number 1; 1⁄4 for the number 2; 1⁄8 for the number 3; and so on. The probabilities will sum to 1, there will be an infinite number of probabilities, and none will be zero. It is perfectly reasonable.
Also, it will not fundamentally change the theorem if some of the probabilities are allowed to be zero.
That is a good point. My objection is clearly incompletely expressed. It is my impression that your reasoning relied on forcing the probability distribution to follow tricks like the above in order to rule out “The distribution is just uniform. That’s it. They’re all the freaking same probability.” That is, I expected you forbid a distribution made up of 1/infinity while insisting that the distribution is infinitely large by saying “but zero isn’t a probability”.
Basically, if you say there are infinitely many options then I say each option is allowed to be 1/infinity and that infinity times 1 divided by infinity is 1. Yes, it looks like adding together lots of zeros to get one, but hey, if you mess with infinity then expect that sort of thing. I’m sure there are special mathematical ways to refer to the “1/infinity” kind of zero but I just say that if you can have infinitely many things you can have infinitely many infinitely small identical things.
Really? That is interesting. That’s an even stronger claim than the one you made in the post. You are confident that it holds even when you remove the “since the probabilities can not be zero” part?
You can have a uniform distribution over a continuous space, but not over integers. The reason is that probability measures are supposed to be countably additive, so if each integer has weight 0, they all together must have weight 0. To make a uniform distribution over infinitely many hypotheses, you need them to be an uncountable set (e.g. parameterize them somehow by real numbers between 0 and 1). This is exactly what happens when your hypotheses talk about an unknown real parameter: each individual value has probability 0, but an interval of values has positive probability that you obtain by integrating the prior over that interval. If you can extend this idea to hypotheses indexed by complexity, this will be a Major Breakthrough, but I think it’s very hard.
I think it’s worth adding that while you can have a uniform distribution over some continuous spaces, like the real numbers between 0 and 1, you can’t have a uniform distribution over all real numbers.
How can you do that? I thought 0-1 had the same cardinality as all reals, and that the cardinality of all reals is too high to have a uniform distribution.
I’m completely flummoxed by the level of discussion here in the comments to Unknowns’s post. When I wrote a post on logic and most commenters confused truth and provability, that was understandable because not everyone can be expected to know mathematical logic. But here we see people who don’t know how to sum or reorder infinite series, don’t know what a uniform distribution is, and talk about “the 1/infinity kind of zero”. This is a rude wakeup call. If we want to discuss issues like Bayesianism, quantum mechanics or decision theory, we need to take every chance to fill the gaps in our understanding of math.
To answer your question: [0,1] does have the same cardinality as all reals, so in the set-theoretic sense they’re equivalent. But they are more than just sets: they come equipped with an additional structure, “measure”). A probability distribution can only be defined as uniform with regard to some measure. The canonical measure of the whole [0,1] is 1, so you can set up a uniform distribution that says the probability of each measurable subset is the measure of that subset (alternatively, the integral of the constant function f(x)=1 over that subset). But the canonical measure of the whole real line is infinite, so you cannot build a uniform distribution with respect to that.
After you’re comfortable with the above idea, we can add another wrinkle: even though we cannot set up a uniform distribution over all reals, we can in some situations use a uniform prior over all reals. Such things are called improper priors and rely on the lucky fact that the arithmetic of Bayesian updating doesn’t require the integral of the prior to be 1, so in well-behaved situations even “improper” prior distributions can always give rise to “proper” posterior distributions that integrate to 1.
Yeah, it might help to make a list of “math recommended for reading and posting on Less Wrong” Unfortunately, the entire set is likely large enough such that even many physics majors won’t have all of it (lots of physics people don’t take logic or model theory classes). At this point the list of math topics seems to include Lebesque integration, Godel’s theorems and basic model theory, basics of continuous and discrete probability spaces, and a little bit of theoretical compsci ranging over a lot of topics (both computability theory and complexity theory seem relevant). Some of the QM posts also require a bit of linear algebra to actually grok well but I suspect that anyone who meets most of the rest of the list will have that already. Am I missing any topics?
Not all LW participants need to know advanced math. Human rationality is a broad topic, and someone like Yvain or Alicorn can contribute a lot without engaging the math side of things. What I care about is the signal-to-noise ratio in math discussion threads. In other words, I’m okay with regular ignorance but hate vocal ignorance with a fiery passion.
I propose a simple heuristic: if you see others using unfamiliar math, look it up before commenting. If it relies on other concepts that you don’t understand, look them up too, and so on. Yes, it might take you days of frantic digging. It sometimes takes me days full-time, even though I have a math degree. Time spent learning more math is always better invested than time spent in novice-level discussions.
That seems valid, but part of the trouble also seems to be people thinking they understand math that they don’t after reading popularizations. I’m not sure how to deal with that other than just having those individuals read the actual math.
We need merely a norm to tell people to stop, not a magic way of explaining math faster than possible. Also, more realistic timeline for grokking (as opposed to parsing) deeper concepts is months, not days, and that’s if you are smart enough in the first place.
Agreed. I have no idea either.
Well, if someone posts something wrong, call them out on it.
Empirically that doesn’t seem to help much. See for example the comment thread in cousin_it’s last top-level post.
Cousin_it and JoshuaZ, this sounds as though it could be a good topic (or group of topics) for a top-level post.
Do you mean a set of posts on telling when you don’t know enough or a set of posts on the math people should know?
Either or both would be useful, but I was thinking about the latter.
In getting to some of those things, there are some more basic subjects that would have to be mastered:
Algebra: giving names to unknown numerical quantities and then reasoning about them the way one would with actual numbers.
Calculus: the relationship between a rate of change and a total amount, and the basic differential equations of physics, e.g. Newtonian mechanics, the diffusion equation, etc.
If this all seems like a lot, many people spend until well into their twenties in school. Think where they would get to if all that time had been usefully spent!
Why is all that physics necessary? I’m not seeing it.
Practical examples. Not many people are going to plough through abstract mathematics without them.
Causal networks, axiomatizations of expected utility.
Gosh, now I don’t know whether to feel bad or not for asking that question.
But I guess ‘no, it’s not just cardinality that matters but measure’ is a good answer. Is there any quick easy explanation of measure and its use in probability?
(I have yet to learn anything from Wikipedia on advanced math topics. You don’t hear bad things about Wikipedia’s math topics, but as Bjarne said of C++, complaints mean there are users.)
Don’t feel bad, you’re actually a hero. The four levels of depravity, by cousin_it:
Ask an illiterate question.
Assert an illiterate statement.
Assert and violently defend an illiterate statement.
Assert and violently defend an illiterate statement, becoming offended and switching to ad hominems when you begin to lose.
I’m not sure if other people can successfully learn math topics from Wikipedia because I’m atypical and other-optimizing is hard. Anyway, here’s how you tell whether you actually understand a math topic: you should be able to solve simple problems. For example, someone who can’t find Nash equilibria in simple 2x2 games is unqualified to talk about the Prisoner’s Dilemma, and someone who can’t solve the different Monty Hall variants is unqualified to talk about Bayesianism and priors.
You should post this hierarchy somewhere more permanent. It seems… useful.
Here’s a non-wiki explanation of measure. http://www.ams.org/bookstore/pspdf/stml-48-prev.pdf It’s a generalization of the concept of length.
Complaints mean there are annoyed users, yes. :-)
Also, I second this observation of Wikipedia’s math pages not being a good learning resource. They’re pretty good as reference and for refreshers on stuff you’ve already learned, but it’s no substitute for a decent textbook and/or instructor.
Yes, it’s uncountable, so the probability of any particular real number is zero. I’m not going to go into detail by what I mean by a probability distribution here (though I’ll note that cousin it used the word “integrate”). To uniformly pick a number $x$ from 0 to 1, use an infinite sequence of coin flips to represent it in binary. Infinity is impractical, but for natural questions, like: is $x$ between 1⁄5 and 1/sqrt(2), you will almost surely need only finitely many flips (and your expected number is also finite). And the probability that the answer is yes is 1/sqrt(2)-1/5; that is the sense in which it is uniform.
You can transfer this distribution to the set of all real numbers, eg, by logit or arctan, but it won’t be uniform in the same sense. One can satisfy that uniformity by a measure which not a probability measure.
It won’t change “fundamentally” i.e. in any important way, but it will slightly change the meaning of the probability decreasing on average, because for example, the probability of all hypotheses of complexity 17 might be 0, and if this is the case then there is no greater complexity which will give a lower probability for this particular value. But in any case either the terms will sum to 1 after a finite number of values, in which case the probability will drop permanently to 0 after that, or it will not, in which case the theorem will continue to hold as applied to all the hypotheses that have some positive probability.
Just because there are infinitely many options doesn’t mean that it’s reasonable to assign a probability of 1/infinity. These options are claims about reality that you make with a language; saying that their probability is such a value is much the same as if I said that the probability that you are right in this disagreement is 1/infinity. If someone can say this in the context of my argument, he can say it any context, and this would not be standard probability theory.
Is the claim “you can have probability distributions with infinitely many options and no such distribution can be uniform” a premise or conclusion of your argument?
I don’t agree with the above but even if I did I would assert that it is trivial to reverse the order of ranking such that the arbitrary limitations on the probability distribution lead you to completely the opposite conclusion.
You can’t reverse the order, because there is a least element but no greatest element. So they can’t be swapped round.
The time taken to reverse the ordering of an ordered list depends on the size of the list. Yes, if the size is infinite you cannot reverse it. Or print it out. Or sum it. Or find the median. Infinity messes things up. It certainly doesn’t lead us to assuming:
“Things can be considered in order of complexity but not in order of simplicity (1/compexity) and therefore the most complex things must be lower probability than simple things.”
I say this is just demonstrating “things you shouldn’t do with infinity”, not proving Occams Razor.
That is not a conclusion. By the definitions given, there is no “most complex thing”; rather, for any given element you pick out that you might think is the most complex thing, there are infinitely many that are more complex.
Infinity is not magic.
The claim you mention is a conclusion from standard probability theory.
The order can’t be reversed, just as you can’t reverse the order of the natural numbers to make them go from high to low. Yes, you could start at a google and count down to 1, but then after that you would be back at counting up again. In the same way every possible ordering of hypotheses will sooner or later start ordering them from simpler to more complex.