wedrifid comments on A Proof of Occam’s Razor

wedrifid 11 Aug 2010 6:41 UTC
−9 points
I cannot help but get the impression that this conclusion relies on doing nasty things with infinity. In particular it seems to take both the following assumptions:
- 0 can not be a probability.
- There can be an infinite number of probabilities in a probability distribution.
- All the probabilities in a distribution sum to 1.
… and then play them off against each other. If I was one of those high complexity theorems I’d be feeling fairly violated right about now. In fact I can recall someone using similar reasoning in a discussion with Eliezer (it involved throwing darts at a continuous real valued dart board). It ended with Eliezer questioning “0 can not be a probability” in the case of infinite sets. I don’t recall if he also questioned infinite set atheism.
- Unknowns 11 Aug 2010 7:28 UTC
  2 points
  Parent
  Those things aren’t opposed to one another. It is perfectly possible for me to choose an integer with such a probability distribution: ¹⁄₂ for the number 1; ¹⁄₄ for the number 2; ¹⁄₈ for the number 3; and so on. The probabilities will sum to 1, there will be an infinite number of probabilities, and none will be zero. It is perfectly reasonable.
  
  Also, it will not fundamentally change the theorem if some of the probabilities are allowed to be zero.
  - wedrifid 11 Aug 2010 8:09 UTC
    −5 points
    Parent
    
    Those things aren’t opposed to one another. It is perfectly possible for me to choose an integer with such a probability distribution: ¹⁄₂ for the number 1; ¹⁄₄ for the number 2; ¹⁄₈ for the number 3; and so on. The probabilities will sum to 1, there will be an infinite number of probabilities, and none will be zero. It is perfectly reasonable.
    
    That is a good point. My objection is clearly incompletely expressed. It is my impression that your reasoning relied on forcing the probability distribution to follow tricks like the above in order to rule out “The distribution is just uniform. That’s it. They’re all the freaking same probability.” That is, I expected you forbid a distribution made up of 1/infinity while insisting that the distribution is infinitely large by saying “but zero isn’t a probability”.
    
    Basically, if you say there are infinitely many options then I say each option is allowed to be 1/infinity and that infinity times 1 divided by infinity is 1. Yes, it looks like adding together lots of zeros to get one, but hey, if you mess with infinity then expect that sort of thing. I’m sure there are special mathematical ways to refer to the “1/infinity” kind of zero but I just say that if you can have infinitely many things you can have infinitely many infinitely small identical things.
    
    Also, it will not fundamentally change the theorem if some of the probabilities are allowed to be zero.
    
    Really? That is interesting. That’s an even stronger claim than the one you made in the post. You are confident that it holds even when you remove the “since the probabilities can not be zero” part?
    - cousin_it 11 Aug 2010 9:17 UTC
      6 points
      Parent
      
      Yes, it looks like adding together lots of zeros to get one, but hey, if you mess with infinity then expect that sort of thing. I’m sure there are special mathematical ways to refer to the “1/infinity” kind of zero but I just say that if you can have infinitely many things you can have infinitely many infinitely small identical things.
      
      You can have a uniform distribution over a continuous space, but not over integers. The reason is that probability measures are supposed to be countably additive, so if each integer has weight 0, they all together must have weight 0. To make a uniform distribution over infinitely many hypotheses, you need them to be an uncountable set (e.g. parameterize them somehow by real numbers between 0 and 1). This is exactly what happens when your hypotheses talk about an unknown real parameter: each individual value has probability 0, but an interval of values has positive probability that you obtain by integrating the prior over that interval. If you can extend this idea to hypotheses indexed by complexity, this will be a Major Breakthrough, but I think it’s very hard.
      - Douglas_Knight 11 Aug 2010 15:36 UTC
        6 points
        Parent
        
        You can have a uniform distribution over a continuous space, but not over integers.
        
        I think it’s worth adding that while you can have a uniform distribution over some continuous spaces, like the real numbers between 0 and 1, you can’t have a uniform distribution over all real numbers.
        gwern 11 Aug 2010 16:10 UTC
        0 points
        Parent
        How can you do that? I thought 0-1 had the same cardinality as all reals, and that the cardinality of all reals is too high to have a uniform distribution.
        cousin_it 11 Aug 2010 17:02 UTC
        16 points
        Parent
        I’m completely flummoxed by the level of discussion here in the comments to Unknowns’s post. When I wrote a post on logic and most commenters confused truth and provability, that was understandable because not everyone can be expected to know mathematical logic. But here we see people who don’t know how to sum or reorder infinite series, don’t know what a uniform distribution is, and talk about “the 1/infinity kind of zero”. This is a rude wakeup call. If we want to discuss issues like Bayesianism, quantum mechanics or decision theory, we need to take every chance to fill the gaps in our understanding of math.
        
        To answer your question: [0,1] does have the same cardinality as all reals, so in the set-theoretic sense they’re equivalent. But they are more than just sets: they come equipped with an additional structure, “measure”). A probability distribution can only be defined as uniform with regard to some measure. The canonical measure of the whole [0,1] is 1, so you can set up a uniform distribution that says the probability of each measurable subset is the measure of that subset (alternatively, the integral of the constant function f(x)=1 over that subset). But the canonical measure of the whole real line is infinite, so you cannot build a uniform distribution with respect to that.
        
        After you’re comfortable with the above idea, we can add another wrinkle: even though we cannot set up a uniform distribution over all reals, we can in some situations use a uniform prior over all reals. Such things are called improper priors and rely on the lucky fact that the arithmetic of Bayesian updating doesn’t require the integral of the prior to be 1, so in well-behaved situations even “improper” prior distributions can always give rise to “proper” posterior distributions that integrate to 1.
        What links here?
        cousin_it's comment on Open Thread, August 2010-- part 2 by NancyLebovitz (14 Aug 2010 12:10 UTC; 9 points)
        JoshuaZ 11 Aug 2010 22:02 UTC
        2 points
        Parent
        
        I’m completely flummoxed by the level of discussion here in the comments to Unknowns’s post. When I wrote a post on logic and most commentors confused truth and provability, that was understandable because not everyone can be expected to know mathematical logic. But here we see people who don’t know how to sum or reorder infinite series, don’t know what a uniform distribution is, and talk about “the 1/infinity kind of zero”. This is a rude wakeup call. If we want to discuss issues like Bayesianism, quantum mechanics or decision theory, we need to take every chance to fill the gaps in our understanding of math.
        
        Yeah, it might help to make a list of “math recommended for reading and posting on Less Wrong” Unfortunately, the entire set is likely large enough such that even many physics majors won’t have all of it (lots of physics people don’t take logic or model theory classes). At this point the list of math topics seems to include Lebesque integration, Godel’s theorems and basic model theory, basics of continuous and discrete probability spaces, and a little bit of theoretical compsci ranging over a lot of topics (both computability theory and complexity theory seem relevant). Some of the QM posts also require a bit of linear algebra to actually grok well but I suspect that anyone who meets most of the rest of the list will have that already. Am I missing any topics?
        cousin_it 11 Aug 2010 22:59 UTC
        25 points
        Parent
        Not all LW participants need to know advanced math. Human rationality is a broad topic, and someone like Yvain or Alicorn can contribute a lot without engaging the math side of things. What I care about is the signal-to-noise ratio in math discussion threads. In other words, I’m okay with regular ignorance but hate vocal ignorance with a fiery passion.
        
        I propose a simple heuristic: if you see others using unfamiliar math, look it up before commenting. If it relies on other concepts that you don’t understand, look them up too, and so on. Yes, it might take you days of frantic digging. It sometimes takes me days full-time, even though I have a math degree. Time spent learning more math is always better invested than time spent in novice-level discussions.
        What links here?
        cousin_it's comment on What should I have for dinner? (A case study in decision making) by bentarm (13 Aug 2010 18:43 UTC; 5 points)
        JoshuaZ 11 Aug 2010 23:13 UTC
        4 points
        Parent
        That seems valid, but part of the trouble also seems to be people thinking they understand math that they don’t after reading popularizations. I’m not sure how to deal with that other than just having those individuals read the actual math.
        Vladimir_Nesov 12 Aug 2010 8:33 UTC
        6 points
        Parent
        We need merely a norm to tell people to stop, not a magic way of explaining math faster than possible. Also, more realistic timeline for grokking (as opposed to parsing) deeper concepts is months, not days, and that’s if you are smart enough in the first place.
        cousin_it 11 Aug 2010 23:22 UTC
        1 point
        Parent
        Agreed. I have no idea either.
        Expand this thread
        katydee 11 Aug 2010 23:42 UTC
        3 points
        Parent
        Well, if someone posts something wrong, call them out on it.
        JoshuaZ 11 Aug 2010 23:48 UTC
        5 points
        Parent
        Empirically that doesn’t seem to help much. See for example the comment thread in cousin_it’s last top-level post.
        NancyLebovitz 12 Aug 2010 9:18 UTC
        3 points
        Parent
        Cousin_it and JoshuaZ, this sounds as though it could be a good topic (or group of topics) for a top-level post.
        JoshuaZ 12 Aug 2010 20:54 UTC
        2 points
        Parent
        Do you mean a set of posts on telling when you don’t know enough or a set of posts on the math people should know?
        Expand this thread
        NancyLebovitz 12 Aug 2010 22:28 UTC
        2 points
        Parent
        Either or both would be useful, but I was thinking about the latter.
        Richard_Kennaway 12 Aug 2010 13:31 UTC
        0 points
        Parent
        
        Am I missing any topics?
        
        In getting to some of those things, there are some more basic subjects that would have to be mastered:
        
        Algebra: giving names to unknown numerical quantities and then reasoning about them the way one would with actual numbers.
        
        Calculus: the relationship between a rate of change and a total amount, and the basic differential equations of physics, e.g. Newtonian mechanics, the diffusion equation, etc.
        
        If this all seems like a lot, many people spend until well into their twenties in school. Think where they would get to if all that time had been usefully spent!
        Sniffnoy 12 Aug 2010 23:00 UTC
        0 points
        Parent
        Why is all that physics necessary? I’m not seeing it.
        Richard_Kennaway 13 Aug 2010 7:00 UTC
        0 points
        Parent
        Practical examples. Not many people are going to plough through abstract mathematics without them.
        Vladimir_Nesov 12 Aug 2010 8:30 UTC
        0 points
        Parent
        Causal networks, axiomatizations of expected utility.
        gwern 12 Aug 2010 4:38 UTC
        0 points
        Parent
        Gosh, now I don’t know whether to feel bad or not for asking that question.
        
        But I guess ‘no, it’s not just cardinality that matters but measure’ is a good answer. Is there any quick easy explanation of measure and its use in probability?
        
        (I have yet to learn anything from Wikipedia on advanced math topics. You don’t hear bad things about Wikipedia’s math topics, but as Bjarne said of C++, complaints mean there are users.)
        cousin_it 12 Aug 2010 8:50 UTC
        15 points
        Parent
        Don’t feel bad, you’re actually a hero. The four levels of depravity, by cousin_it:
        
        Ask an illiterate question.
        
        Assert an illiterate statement.
        
        Assert and violently defend an illiterate statement.
        
        Assert and violently defend an illiterate statement, becoming offended and switching to ad hominems when you begin to lose.
        
        I’m not sure if other people can successfully learn math topics from Wikipedia because I’m atypical and other-optimizing is hard. Anyway, here’s how you tell whether you actually understand a math topic: you should be able to solve simple problems. For example, someone who can’t find Nash equilibria in simple 2x2 games is unqualified to talk about the Prisoner’s Dilemma, and someone who can’t solve the different Monty Hall variants is unqualified to talk about Bayesianism and priors.
        WrongBot 13 Aug 2010 20:12 UTC
        4 points
        Parent
        You should post this hierarchy somewhere more permanent. It seems… useful.
        [deleted] 12 Aug 2010 4:55 UTC
        4 points
        Parent
        Here’s a non-wiki explanation of measure. http://www.ams.org/bookstore/pspdf/stml-48-prev.pdf It’s a generalization of the concept of length.
        DSimon 16 Dec 2010 5:26 UTC
        0 points
        Parent
        Complaints mean there are annoyed users, yes. :-)
        
        Also, I second this observation of Wikipedia’s math pages not being a good learning resource. They’re pretty good as reference and for refreshers on stuff you’ve already learned, but it’s no substitute for a decent textbook and/or instructor.
        Douglas_Knight 11 Aug 2010 17:04 UTC
        4 points
        Parent
        Yes, it’s uncountable, so the probability of any particular real number is zero. I’m not going to go into detail by what I mean by a probability distribution here (though I’ll note that cousin it used the word “integrate”). To uniformly pick a number $x$ from 0 to 1, use an infinite sequence of coin flips to represent it in binary. Infinity is impractical, but for natural questions, like: is $x$ between ¹⁄₅ and 1/sqrt(2), you will almost surely need only finitely many flips (and your expected number is also finite). And the probability that the answer is yes is 1/sqrt(2)-1/5; that is the sense in which it is uniform.
        
        You can transfer this distribution to the set of all real numbers, eg, by logit or arctan, but it won’t be uniform in the same sense. One can satisfy that uniformity by a measure which not a probability measure.
    - Unknowns 11 Aug 2010 9:02 UTC
      2 points
      Parent
      It won’t change “fundamentally” i.e. in any important way, but it will slightly change the meaning of the probability decreasing on average, because for example, the probability of all hypotheses of complexity 17 might be 0, and if this is the case then there is no greater complexity which will give a lower probability for this particular value. But in any case either the terms will sum to 1 after a finite number of values, in which case the probability will drop permanently to 0 after that, or it will not, in which case the theorem will continue to hold as applied to all the hypotheses that have some positive probability.
      
      Just because there are infinitely many options doesn’t mean that it’s reasonable to assign a probability of 1/infinity. These options are claims about reality that you make with a language; saying that their probability is such a value is much the same as if I said that the probability that you are right in this disagreement is 1/infinity. If someone can say this in the context of my argument, he can say it any context, and this would not be standard probability theory.
      - wedrifid 11 Aug 2010 9:27 UTC
        −7 points
        Parent
        Is the claim “you can have probability distributions with infinitely many options and no such distribution can be uniform” a premise or conclusion of your argument?
        
        I don’t agree with the above but even if I did I would assert that it is trivial to reverse the order of ranking such that the arbitrary limitations on the probability distribution lead you to completely the opposite conclusion.
        Oscar_Cunningham 11 Aug 2010 9:39 UTC
        3 points
        Parent
        You can’t reverse the order, because there is a least element but no greatest element. So they can’t be swapped round.
        wedrifid 11 Aug 2010 9:58 UTC
        −10 points
        Parent
        
        You can’t reverse the order, because there is a least element but no greatest element. So they can’t be swapped round.
        
        The time taken to reverse the ordering of an ordered list depends on the size of the list. Yes, if the size is infinite you cannot reverse it. Or print it out. Or sum it. Or find the median. Infinity messes things up. It certainly doesn’t lead us to assuming:
        
        “Things can be considered in order of complexity but not in order of simplicity (1/compexity) and therefore the most complex things must be lower probability than simple things.”
        
        I say this is just demonstrating “things you shouldn’t do with infinity”, not proving Occams Razor.
        thomblake 11 Aug 2010 17:37 UTC
        0 points
        Parent
        
        the most complex things must be lower probability than simple things
        
        That is not a conclusion. By the definitions given, there is no “most complex thing”; rather, for any given element you pick out that you might think is the most complex thing, there are infinitely many that are more complex.
        
        Infinity is not magic.
        Unknowns 11 Aug 2010 19:15 UTC
        2 points
        Parent
        The claim you mention is a conclusion from standard probability theory.
        
        The order can’t be reversed, just as you can’t reverse the order of the natural numbers to make them go from high to low. Yes, you could start at a google and count down to 1, but then after that you would be back at counting up again. In the same way every possible ordering of hypotheses will sooner or later start ordering them from simpler to more complex.