[deleted] comments on reply to benelliott about Popper issues

[deleted] 7 Apr 2011 15:19 UTC
9 points
The principle difference between Bayesian and Popperian epistemology is that Bayesianism is precise; it puts all necessary ambiguity in the prior, and assumes only noncontroversial, well-tested mathematical mathematical axioms, and everything thereafter is deductively sound math.

I think you’re overselling it. Here are two big weaknesses of Bayesian epistemology as I understand it:
1. it cannot handle uncertainty about unproved mathematical truths.
2. It does not describe the way any existing intelligence actually operates, or even could operate in principle. (That last clause is the problem of writing an AI.)
I have never seen on this website any argument resolved or even approached on semirigorous Bayesian lines, except a couple of not-so-successful times between Yudkowsky and Hanson. Popper, or the second-hand accounts of him that I understand, seems to describe the way that I (and I think you!) actually think about things: we collect a big database of explanations and of criticisms of those explanations, and we decide the merits of those explanations and criticisms using our messy judgement.

In cases that satisfy some mild assumptions (but not so mild as to handle the important problem 1.!) this might be equivalent to Bayesianism. But equivalences go both ways, and Popper seems to be what we actually practice—what’s the problem?
- timtyler 7 Apr 2011 17:02 UTC
  1 point
  Parent
  
  Here are two big weaknesses of Bayesian epistemology as I understand it:
  
  it cannot handle uncertainty about unproved mathematical truths.
  
  Why do you think that?
  
  It does not describe the way any existing intelligence actually operates, or even could operate in principle. (That last clause is the problem of writing an AI.)
  
  Solomonoff induction is uncomputable? So: use a computable approximation.
  - curi 7 Apr 2011 19:38 UTC
    7 points
    Parent
    
    Solomonoff induction is uncomputable? So: use a computable approximation.
    
    What is the argument that the approximation you use is good? What I mean is, when you approximate you are making changes. Some possible changes you could make would create massive errors. Others—the type you are aiming for—only create small errors that don’t spread all over. What is your method of creating an approximation of the second type?
    - timtyler 7 Apr 2011 21:02 UTC
      1 point
      Parent
      Making computable approximations of Solomonoff induction is a challenging field which it seems inappropriate to try and cram into a blog comment. Probably the short answer is “by using stochastic testing”.
    - JoshuaZ 8 Apr 2011 3:19 UTC
      0 points
      Parent
      There’s a large amount of math behind this sort of thing, and frankly, given your other comments I’m not sure that you have enough background. It might help to just read up on Bayeisian machine learning which needs to deal with just this sort of issue. Then keep in mind that there are theorems that given some fairly weak conditions to rule out pathological cases one approximate any distribution by a computable distribution to arbitrary accuracy. You need to be careful about what metric you are using but it turns out to be true for a variety of different notions of approximating and different metrics. While this is far from my area of expertise, so I’m by no means an expert on this, my impression is that the theorems are essentially of the same flavor as the theorems one would see in a real analysis course about approximating functions with continuous functions or polynomial functions.
  - [deleted] 8 Apr 2011 2:03 UTC
    4 points
    Parent
    
    Why do you think that?
    
    If you think I’m mistaken, please say so and elaborate.
    
    Solomonoff induction is uncomputable? So: use a computable approximation.
    
    It’s hard for me to believe that you haven’t thought of this, but it’s difficult to “approximate” an uncomputable function. Think of any enormous computable function f(n) you like. Any putative “approximation” of the busy beaver function is off by a factor larger than f(n). I bet Solomonoff is similarly impossible to approximate—am I wrong?
    - timtyler 8 Apr 2011 3:11 UTC
      2 points
      Parent
      I am not aware of any particular issues regarding Bayesian epistemology handling uncertainty about unproved mathematical truths. How is that different from other cases where there is uncertainty?
      
      Using a computable approximation of Solomonoff induction is a standard approach. If you don’t have a perfect compressor, you just use the best one you have. It is the same with Solomonoff induction.
      - [deleted] 8 Apr 2011 14:01 UTC
        0 points
        Parent
        
        I am not aware of any particular issues regarding Bayesian epistemology handling uncertainty about unproved mathematical truths. How is that different from other cases where there is uncertainty?
        
        I outlined the problem with mathematical uncertainty here. The only reason I believe that this is an open problem is on Yudkowsky’s say-so in his reply.
        
        Using a computable approximation of Solomonoff induction is a standard approach. If you don’t have a perfect compressor, you just use the best one you have. It is the same with Solomonoff induction.
        
        Standard approach to what? I don’t know what a “compressor” or “perfect compressor” is, if those are technical terms.
        
        To me, the question is whether an approximation to Solomonoff induction has approximately the same behavior as Solomonoff induction. I think it can’t, for instance because no approximation of the busy beaver function (even the “best compressor you have”) behaves anything like the busy beaver function. If you think this is a misleading way of looking at it please tell me.
        JoshuaZ 8 Apr 2011 14:12 UTC
        1 point
        Parent
        
        To me, the question is whether an approximation to Solomonoff induction has approximately the same behavior as Solomonoff induction. I think it can’t, for instance because no approximation of the busy beaver function (even the “best compressor you have”) behaves anything like the busy beaver function. If you think this is a misleading way of looking at it please tell me.
        
        Solomonoff induction can’t handle the Busy Beaver function because Busy Beaver is non-computable. So it isn’t an issue for approximations of Solomonoff (except in so far as they can’t handle it either).
        [deleted] 8 Apr 2011 14:16 UTC
        0 points
        Parent
        I am not saying that “Solomonoff can’t handle Busy Beaver.” (I’m not even sure I know what you mean.) I am saying that Solomonoff is analogous to Busy Beaver, for instance because they are both noncomputable functions. Busy Beaver is non-approximatable in a strong sense, and so I think that Solomonoff might also be non-approximatable in an equally strong sense.
        timtyler 8 Apr 2011 17:14 UTC
        0 points
        Parent
        Kolmogorov complexity is uncomputable, but you can usefully approximate Kolmogorov complexity for many applications using PKZIP. The same goes for Solomonoff induction. Its prior is based on Kolmogorov complexity.
        timtyler 8 Apr 2011 16:29 UTC
        0 points
        Parent
        
        I outlined the problem with mathematical uncertainty here.
        
        Don’t agree with the premise there. As to what Yudkowsky is talking about by saying “Logical uncertainty is an open problem”—it beats me. There’s really only uncertainty. Uncertainty about mathematics is much the same as other kinds of uncertainty.
        [deleted] 8 Apr 2011 16:32 UTC
        0 points
        Parent
        What premise?
        timtyler 8 Apr 2011 16:53 UTC
        0 points
        Parent
        The first 5 lines of the post—this bit::
        
        Any prior P, of any agent, has at least one of the following three properties:
        
        [deleted] 8 Apr 2011 22:49 UTC
        0 points
        Parent
        I can prove that to you, unless I made a mistake. Are you saying you can defeat it a priori by telling me a prior that doesn’t have any of those three properties?
        timtyler 9 Apr 2011 12:08 UTC
        0 points
        Parent
        Take induction, for example, where the domain of the function P(X) ranges over the possible symbols that might come next in the stream (or, if you prefer, ranges over the hypotheses that predict them).
        
        Then P(X and Y) is typically not a meaningful concept.
        [deleted] 9 Apr 2011 15:19 UTC
        0 points
        Parent
        The trichotomy is:
        
        It is not defined on all X—i.e. P is agnostic about some things
        
        It has P(X) < P(X and Y) for at least one pair X and Y—i.e. P sometimes falls for the conjunction fallacy
        
        It has P(X) = 1 for all mathematically provable statements X—i.e. P is an oracle.
        
        Taken literally, your P falls for 1. For instance it doesn’t have an opinion about whether it will be sunny tomorrow or the Riemann hypothesis is true. If you wish avoid this problem by encoding the universe as a string of symbols to feed to your induction machine, why wouldn’t you let me encode “X and Y” at the same time?
- komponisto 7 Apr 2011 16:59 UTC
  0 points
  Parent
  
  Here are two big weaknesses of Bayesian epistemology as I understand it:
  
  1.it cannot handle uncertainty about unproved mathematical truths.
  
  Why not? You just use an appropriate formalization of mathematics, and treat it as uncertainty about the behavior of a proof-searching machine.
  
  I have never seen on this website any argument resolved or even approached on semirigorous Bayesian lines, except a couple of not-so-successful times between Yudkowsky and Hanson
  
  I can think of at least one concrete example. But I’m guessing you were familiar with that example (and numerous other smaller-scale ones) and rejected it, so you must mean something different than I do by “argument approached on semirigorous Bayesian lines”.
  
  equivalences go both ways, and Popper seems to be what we actually practice—what’s the problem?
  
  Perhaps there isn’t any, except insofar as the poster is claiming that Bayes is wrong because it isn’t Popper.
  - JoshuaZ 7 Apr 2011 18:29 UTC
    0 points
    Parent
    
    Why not? You just use an appropriate formalization of mathematics, and treat it as uncertainty about the behavior of a proof-searching machine.
    
    Unfortunately this isn’t helpful. Consider for example a Turing machine that seems to halt on all inputs, and we know that when this one halts it halts with either a 0 or a 1. Does this machine represent a computable sequence (hence should have non-zero probability assigned if one is using a Solomonoff prior)? If we haven’t resolved that question we don’t know. But in order to use any form of prior over computable sequences we need to assume that we have access to what actually represents a computable hypothesis and what doesn’t. There are other problems as well.
    - komponisto 7 Apr 2011 21:15 UTC
      0 points
      Parent
      I’m having trouble parsing your third (I don’t know what it means for a Turing machine to [fail to] “represent a computable sequence”, especially since I thought that a “computable sequence” was by definition the output of a Turning machine) and fourth (we don’t know what?) sentences, but if your general point is what I think it is (“after formalizing logical uncertainty, we’ll still have meta-logical uncertainty left unformalized!”), that’s simply a mathematical fact, and not an argument against the possibility of formalizing logical uncertainty in the first place.
      - JoshuaZ 8 Apr 2011 1:52 UTC
        1 point
        Parent
        
        I’m having trouble parsing your third (I don’t know what it means for a Turing machine to [fail to] “represent a computable sequence”, especially since I thought that a “computable sequence” was by definition the output of a Turning machine)
        
        A sequence f(n) is computable if there’s a Turing machine T that given input n halts with output f(n). But, not all Turing machines halt on all inputs. It isn’t hard to make Turing machines that go into trivial infinite loops, and what is worse, Turing machines can fail to halt in much more ugly and non-obvious ways to the point where the question “Does the Turing machine M halt on input n” is not in general decidable. This is known at the Halting theorem. So if I’m using some form of Solomonoff prior I can’t even in general tell whether a machine describes a point in my hypothesis space.
        komponisto 8 Apr 2011 2:15 UTC
        0 points
        Parent
        What I don’t understand is your argument that there is a specific problem with logical uncertainty that doesn’t apply to implementing Solomonoff induction in general. Yes, the halting problem is undecidable, so you can’t decide if a sequence is computable; but assuming you’ve already got a Solomonoff-induction machine that can say “my probability that it will rain tomorrow is 50%”, why can’t it also say “my probability that the Riemann Hypothesis is true is 50%”?
        JoshuaZ 8 Apr 2011 2:23 UTC
        1 point
        Parent
        
        but assuming you’ve already got a Solomonoff-induction machine that can say “my probability that it will rain tomorrow is 50%”, why can’t it also say “my probability that the Riemann Hypothesis is true is 50%”?
        
        That’s actually a really good example. It isn’t that difficult to make a Turing machine that halts if and only if the Riemann hypothesis is true. So a system using Solomonoff induction has to recognize for starters whether or not that Turing machine halts. Essentially, in the standard version of Solomonoff induction, you need to assume that you have access to indefinitely large computing power. You can try making models about what happens when you have limited computational power in your entity (In some sense AIXI implementations and implementations of Bayesian reasoning need to do close to this). But if one doesn’t assume that one has indefinite computing power then a lot of the results about how different priors behave no longer hold (or at least the proofs don’t obviously go through). For more detail on that sort of thing I’d recommend talking to cousin_it or jimrandomh since they’ve thought and know a lot more about these issues than I do.
        komponisto 8 Apr 2011 15:19 UTC
        0 points
        Parent
        
        It isn’t that difficult to make a Turing machine that halts if and only if the Riemann hypothesis is true. So a system using Solomonoff induction has to recognize for starters whether or not that Turing machine halts.
        
        Only in the sense that a human trying to solve the Riemann Hypothesis also has to recognize whether the same Turing machine halts.
        
        When I talk about “going meta”, I really mean it: when the Solomonoff machine that I have in mind is considering “whether this sequence is computable” or “whether the Riemann Hypothesis is true” or more generally “whether this Turing machine halts”, it is going to be doing so the same way a human does: by using a model of the mathematical object in question that isn’t actually equivalent to that same mathematical object. It won’t be answering the question “natively”, the way a computer typically adds 3+5 (i.e. by specific addition algorithms built into it); instead, it will be more closely analogous to a computer being programmed to simulate three apples being combined with five apples on its display screen, and then count the apples by recognizing their visual representation.
        
        So the upshot is that to be able to give an answer the question “what is your probability that this Turing machine halts?”, the Solomonoff AI does not need to solve anything equivalent to the halting problem. It just needs to examine the properties of some internal model corresponding to the label “Turing machine”, which need not be an actual Turing machine. It is in this way that uncertainty about mathematical truths is handled.
        
        It should go without saying that this isn’t directly of use in building such an AI, because it doesn’t tell you anything about how to construct the low-level algorithms that actually run it. But this thread isn’t about how to build a Bayesian AI; rather, it’s about whether a Bayesian AI is something that it makes sense to build. And my point here is that “Well, if you had a Bayesian AI, it wouldn’t be able to give you probability estimates concerning the truth of mathematical statements” is not a valid argument on the latter question.
        [deleted] 8 Apr 2011 15:56 UTC
        0 points
        Parent
        
        So the upshot is that to be able to give an answer the question “what is your probability that this Turing machine halts?”, the Solomonoff AI does not need to solve anything equivalent to the halting problem.
        
        By “Solomonoff AI” do you mean “some computable approximation of a Solomonoff AI”? My impression is that the Solomonoff prior just does solve the halting problem, and that this is a standard proof that it is uncomputable.
        
        when the Solomonoff machine that I have in mind is considering “whether this sequence is computable” or “whether the Riemann Hypothesis is true” or more generally “whether this Turing machine halts”, it is going to be doing so the same way a human does.
        
        Humans are bad at this. Is there some reason to think that a “the Solomonoff machine you have in mind” will be better at it?
        komponisto 8 Apr 2011 18:10 UTC
        0 points
        Parent
        
        My impression is that the Solomonoff prior just does solve the halting problem,
        
        The programmer would need to have solved the halting problem in order to program the Solomonoff prior into the AI, but the AI itself would not be solving the halting problem.
        
        when the Solomonoff machine that I have in mind is considering “whether this sequence is computable” or “whether the Riemann Hypothesis is true” or more generally “whether this Turing machine halts”, it is going to be doing so the same way a human does.
        
        Humans are bad at this. Is there some reason to think that a “the Solomonoff machine you have in mind” will be better at it?
        
        It may or may not be (though hopefully it would be more intelligent than humans—that’s the point after all!); it doesn’t matter for the purpose of this argument.
        [deleted] 9 Apr 2011 3:15 UTC
        0 points
        Parent
        
        The programmer would need to have solved the halting problem in order to program the Solomonoff prior into the AI, but the AI itself would not be solving the halting problem.
        
        This strikes me as a confused way of looking at things. As you know, it’s a theorem that the halting problem is unsolvable in a technical sense. That theorem expresses a limitation on computer programs, not computer programmers.
- khafra 7 Apr 2011 15:36 UTC
  0 points
  Parent
  My thinking seems to me more qualitatively bayesian than popperian. I don’t have a good enough memory to keep all the criticisms I’ve ever heard of each theory I provisionally accept in mind. Instead, when I encounter a criticism that seems worth considering, I decrease my belief in the theory by an amount corresponding to the strenghth of the criticism. If I then go on to find evidence that weakens the criticism, strengthens the original theory, or weakens all possible alternate theories, I increase my belief in the original theory again.
  - curi 7 Apr 2011 21:18 UTC
    −1 points
    Parent
    You raise an interesting issue which is: what is the strength of a criticism? How is that determined?
    
    For example, your post is itself a criticism of Popperian epistemology. What is the strength of your post?
    
    By not using strengths of arguments, I don’t have this problem. Strengths of arguments remind me of proportional representation voting where every side gets a say. PR voting makes a mess of things, not just in practice but also in terms of rigorous math (e.g. Arrow’s Theorem)
    - Sniffnoy 8 Apr 2011 4:58 UTC
      2 points
      Parent
      What does Arrow’s theorem have to do with proportional representation? Arrow’s theorem deals with single-winner ordinal voting systems. Is there some generalization that covers proportional representation as well?
      - curi 8 Apr 2011 9:37 UTC
        −3 points
        Parent
        For one thing, all elections have a single overall outcome that wins.
        Sniffnoy 8 Apr 2011 10:12 UTC
        2 points
        Parent
        Indeed, but “single-winner” has a technical meaning here that’s rather more restrictive than that. Unless each voter could choose their vote aribitrarily from among the set of those overall outcomes, it’s not single-winner.
        curi 8 Apr 2011 16:58 UTC
        −9 points
        Parent
        can you give the meaning of “single winner” and the reason you think not directly voting for the single winner will remove the problems?
        
        in the US presidential elections, no voters can directly vote for his desired overall outcome. we have a primary system. are you saying that the primary system makes arrow’s theorem’s problems go away for us?
        Sniffnoy 8 Apr 2011 17:53 UTC
        2 points
        Parent
        This appears to be very confused. Arrow’s theorem is a theorem, a logical necessity, not a causal influence. It does not go around causing problems, that can be then be warded off by modifying your system to avoid its preconditions. It’s just a fact that your voting system can’t satisfy all its desiderata simultaneously. If you’re not talking about a single-winner voting system it’s simply an inapplicable statement. Perhaps the system meets all the desiderata, perhaps it doesn’t. Arrow’s theorem simply has nothing to say about it. But if you take a system that meets the preconditions, and simply modify it to not do so, then there’s no reason to expect that it suddenly will start meeting the desiderata. For instance, though I think it’s safe to say that US presidential elections are single winner, even if they somehow didn’t count as such they’d fail to satisfy IIA. My point is just don’t bring up theorems where they don’t apply.
        
        If you want a formal statement of Arrow’s theorem… well it can take several forms, so take a look at the definitions I made in this comment about their equivalence. Then with that set of definitions, any voting system satisfying IIA and unanimity (even in just the “weak” form where if everyone votes for the same linear order, the overall winner is the winner of that linear order) with a finite set of voters must be a dictatorship. (There’s a version for infinite sets of voters too but that’s not really relevant here.)
        curi 8 Apr 2011 18:03 UTC
        −14 points
        Parent
        You’re very condescending and you’re arguing with my way of speaking while refusing to actually provide a substantive argument.
        
        it does apply. you’re treating yourself as an authority, and me a beginner. and just assuming that knowing nothing about me.
        
        you made a mistake which i explained in my second paragraph. your response: to ignore it, and say you were right anyway and i should just read more.
        Sniffnoy 8 Apr 2011 18:13 UTC
        4 points
        Parent
        
        and just assuming that knowing nothing about me.
        
        I am, in fact, inferring that based on what you wrote.
        
        you made a mistake which i explained in my second paragraph. your response: to ignore it, and say you were right anyway and i should just read more.
        
        Then I must ask that you point out this mistake more explicitly, because nothing in that second paragraph contradicts anything I said.
        curi 8 Apr 2011 18:15 UTC
        −3 points
        Parent
        Your claims implied Arrow’s Theorem doesn’t apply to the US election system. Or pretty much any other. You also defined “single-winner” so that the single US president who wins the election doesn’t qualify.
        
        OK, you’re right I didn’t actually contradict you. But don’t you think that’s a mistake? I think it does apply to real life voting systems that people use.
        Expand this thread
        Sniffnoy 8 Apr 2011 18:18 UTC
        5 points
        Parent
        Perhaps you misunderstand what is meant when people say a theorem “applies”. It means that the preconditions are met (and that therefore the conclusions hold), not simply that the conclusions hold. The conclusions of a theorem can hold without the theorem being at all applicable.
        curi 8 Apr 2011 18:19 UTC
        −4 points
        Parent
        If you meant “Arrow’s theorem is not applicable to any voting system on Earth” why did you object to my statements about PR? The PR issue is irrelevant, is it not?
    - JoshuaZ 8 Apr 2011 1:47 UTC
      1 point
      Parent
      
      By not using strengths of arguments, I don’t have this problem.
      
      Do you intend to treat all criticism equally?
  - [deleted] 7 Apr 2011 15:48 UTC
    −2 points
    Parent
    I’m suspicious of the notion of “increasing” and “decreasing” a belief. Did you pick those words exactly because you didn’t want to use the word “updating”? Why not?
    
    My guess is that having a bad memory is as much a disadvantage for Bayesians as for Popperians.
    - khafra 7 Apr 2011 16:06 UTC
      4 points
      Parent
      I’m suspicious of your suspicion. Is it purely because of the terms I used, or do you really have no beliefs you hold more tenuously than others?
      
      If I ask you whether the sun rose this morning, will you examine a series of criticisms of that idea for validity and strength?
      
      If I ask you whether it’s dangerous to swim after a heavy meal, you’ll probably check snopes to verify your suspicion that it’s an old wives’ tale, and possibly check any sources cited at snopes. But will you really store the details of all those arguments in your memory as metadata next to “danger of swimming after a heavy meal,” or just mark it as “almost certainly not dangerous”?
      - curi 7 Apr 2011 21:19 UTC
        4 points
        Parent
        To save on storage, learn powerful explanations. Sometimes ideas can be integrated into better ideas that elegantly cover a lot of ground. Finding connections between fields is an important part of learning.
        
        Learning easy to remember rules of thumb—and improving them with criticism when they cause problems—is also valuable for some applications.
      - [deleted] 7 Apr 2011 23:18 UTC
        1 point
        Parent
        I don’t think the shortcuts I take on easy questions are very demonstrative of anything.
        
        I’m suspicious of your suspicion. Is it purely because of the terms I used, or do you really have no beliefs you hold more tenuously than others?
        
        I know what it means to think an outcome is likely or not likely. I don’t know what it means for a belief to be tenuous or not tenuous.