[deleted] comments on (Subjective Bayesianism vs. Frequentism) VS. Formalism

[deleted] 28 Nov 2011 18:09 UTC
4 points
Regarding frequentism vs. Bayesianity in practical applications, the message I take from Yudkowsky and Jaynes is that frequentists have tended historically to lack apprehension of the fact that their methods are ad-hoc, and in general they fail to use Bayesian power when it is in fact advisable to do so—whereas Bayesians feel they can use ad-hoc approximate methods or accurate methods, whichever is appropriate to the task. This is a case in which a questionable philosophy needn’t hamstring someone’s thinking in principle, but appears to do so fairly predictably as a matter of fact.

Incidentally I’m surprised that there appears to be so much disagreement about this, given that LW is basically a forum brought into existence on the strength of Yudkowsky’s abilities as a thinker, writer and populariser, and he clearly holds frequentism in contempt. It’s not necessarily a bad thing that some people here are sympathetic to frequentism—intellectual diversity is good—I’m just surprised that there are so many on a Bayesian rationality forum!

About Maxent: I had in mind chapter 5 of this book by Li and Vitanyi.

We can formulate scientific theories in two steps. First, we formulate a set of possible alternative hypotheses, based on scientific observations or other data. Second, we select one hypothesis as the most likely one. Statistics is the mathematics of how to do this. A relatively recent paradigm in statistical inference was developed by J.J. Rissanen and by C.S. Wallace and his coauthors. The method can be viewed as a computable approximation to the incomputable approach in Section 5.2 [i.e. Solomonoff induction] and was inspired by it. In accordance with Occam’s dictum, it tells us to go for the explanation that compresses the data the most. [...]

This is the MDL (minimum description length) principle.

The ideal MDL principle selects the hypothesis H that minimizes K(H) + K(D|H) [...]

Where K is Kolmogorov complexity.

Unfortunately, the function K is not computable (Section 3.4). For practical applications one must settle for easily computable approximations. [...]

So ideal MDL, like Solomonoff induction, is also incomputable!

They go on to discuss approximations, and on page 390 (I don’t know if you have a copy of the book) they provide a usable approximation to be referred to as “MDL”. Later on page 398 they discuss Maxent, and conclude that that too is an approximation to ideal MDL.

As far as I can see, Maxent is more useful in practical applications than their approximate MDL. I felt that Maxent needed to be defended, since Jaynes considered it to be a major element of Bayesian probability theory; and as far as I can see there is no clearly better practical method of generating priors at this point in time such that Maxent could be considered to be one of Bayesianity’s “legitimate issues” vis a vis frequentism.
- thomblake 28 Nov 2011 21:41 UTC
  3 points
  Parent
  
  Incidentally I’m surprised that there appears to be so much disagreement about this, given that LW is basically a forum brought into existence on the strength of Yudkowsky’s abilities as a thinker, writer and populariser, and he clearly holds frequentism in contempt. It’s not necessarily a bad thing that some people here are sympathetic to frequentism—intellectual diversity is good—I’m just surprised that there are so many on a Bayesian rationality forum!
  
  My intuition here is that you are not observing so many people who are sympathetic to frequentism, so much as people who are unsympathetic to holding contempt.
  
  In much of the comments here you seem to be missing a simple point about mathematics and reference due to its relationship to tribal signaling between the “Bayesians” and the “Frequentists”.
  - [deleted] 28 Nov 2011 22:01 UTC
    0 points
    Parent
    I’ve yet to see anything in this article, or the resulting comments thread, to suggest that the OP has anything to say apart from “let’s say ‘models’ instead of ‘is’ (but mean the same thing)”. And the only consequence of this is to puff up frequentism.
    
    I tried (and apparently failed miserably) to make the case that in the interests of sanity, we should define our terms such that probability ≡ subjective degrees of belief. That’s all it is, a definition—there’s no philosophical significance to this “is” beyond that. It is not a claim that the frequency interpretation doesn’t fit Cox’s postulates—this is a naive interpretation of how language is used on the OP’s part.
    
    The definitional dispute about sound is inapt, because there is nothing to be gained by defining sound as one thing or the other. In this case however there is a real benefit to defining our terms in one particular way.
    
    I will however delete the downvoted posts in this thread, to honour the great disapproval with which my conception of rationality has apparently met in this case.
    - thomblake 28 Nov 2011 22:08 UTC
      1 point
      Parent
      
      I will however delete the downvoted posts in this thread, to honour the great disapproval with which my conception of rationality has apparently met in this case.
      
      Generally, deleting posts with responses is impolite, as the discussion may be helpful to future readers.
    - thomblake 28 Nov 2011 22:11 UTC
      0 points
      Parent
      
      I tried (and apparently failed miserably) to make the case that in in the interests of sanity, we should define our terms such that probability ≡ subjective degrees of belief.
      
      I don’t think you ever supplied a term other than “probability” that we should use for what the OP thought “probability” means. So we’re still left with three entities and two words.
      - [deleted] 28 Nov 2011 22:31 UTC
        2 points
        Parent
        
        I don’t think you ever supplied a term other than “probability” that we should use for what the OP thought “probability” means. So we’re still left with three entities and two words.
        
        Seems like a non-problem. Just say “I am entering these frequencies into Bayes’s theorem”, “I am using the mathematical tools of probability theory” or something like that.
        
        Or perhaps say “probability is a measure of subjectively objective degrees of belief”, and “probability theory is the set of mathematical tools used to compute probabilities, which can also be used to compute frequencies as the case may be”.
        
        Which is pretty much what happens already! This is why I object to such an article—it’s a solution looking for a problem, which creates the illusion of a problem by a) being illiterate, so making itself hard to pin down b) nitpicking the use of words.
        
        Generally, deleting posts with responses is impolite, as the discussion may be helpful to future readers.
        
        They were also steadily generating an amount of negative karma days after posting that I felt was disproportionate, considering they were a sincere attempt to reach agreement with a less-than-articulate interlocutor.
        dlthomas 28 Nov 2011 22:58 UTC
        3 points
        Parent
        
        They were also steadily generating an amount of negative karma days after posting that I felt was disproportionate, considering they were a sincere attempt to reach agreement with a less-than-articulate interlocutor.
        
        Would not retraction have served?
        thomblake 28 Nov 2011 22:58 UTC
        3 points
        Parent
        
        They were also steadily generating an amount of negative karma days after posting that I felt was disproportionate, considering they were a sincere attempt to reach agreement with a less-than-articulate interlocutor.
        
        I did not find User:potato less-than-articulate.
        
        a) being illiterate, so making itself hard to pin down
        
        I’m not sure what you mean by “illiterate” here, nor (thus) how it would make itself ‘hard to pin down’.
        
        b) nitpicking the use of words.
        
        The dispute was about the proper use of words. I did not see anything that looked like ‘nitpicking’ in that context.
        
        The advantage of “Formalism” over “Bayesianism” or “Frequentism” is that it clearly marks the mathematical toolkit, makes it clear what Bayesians and Frequentists are separately talking about, gets rid of the slippage Frequentists allegedly make between “degrees of belief” and “frequencies”, and removes the question of what “probability” is “really” about, all without having to raise a flag in the mind-killing tribal warfare between “Bayesians” and “Frequentists”.
        
        But then, it’s been noted that “a philosopher has never met a distinction he didn’t like”, so perhaps I’m just biased in favor of making clearer the distinction.
        [deleted] 28 Nov 2011 23:33 UTC
        0 points
        Parent
        So in “formalism”, I understand that we are to say: “probability models frequency”, “probability models subjective degrees of belief” and “probability is the set of mathematical discoveries we have made, which deal with [ ], including such things as Bayes’s theorem”.
        
        Whereas at the moment, Bayesians say: “probability is a measure of subjective degrees of belief”, “probability isn’t frequency”, and “probability theory is the set of mathematical discoveries we have made, which deal with probability, including such things as Bayes’s theorem”.
        
        And frequentists say: “probability is long-run frequency”, “probability isn’t subjective degrees of belief”, and “probability theory is the set of mathematical discoveries we have made, which deal with probability, including such things as Bayes’s theorem”.
        
        I like the Bayesian version. But the frequentist version doesn’t confuse me; I understand perfectly well that these are merely competing interpretations, and I’ve never felt the urge to argue specifically about whether probability is degrees of belief or is frequency—nor have I ever seen anyone else do so. Clearly that would be a stupid argument, just like the definitional dispute about sound. However, sensible people do use these terms, arguing about whether probability ‘is’ one or the other, as a proxy for a more substantive argument about which is the better—i.e. more philosophically parsimonious, and having better practical outcomes—interpretation. (Actually they are more likely to phrase the argument as “probability should be considered to be X”, and then say probability is X when they aren’t having the argument, but hey.)
        
        As for the “formalist” version, firstly it puts the frequentist and Bayesian interpretations on a level footing. Even if sensible people were wasting time and effort arguing specifically over a mere definition, the cost of conceding ground to the problematic frequentist interpretation outweighs any benefit from ending that argument, in comparison to the option of simply carrying on using the language of the Bayesian.
        
        Furthermore it appears to me that probability theory, given this use of language, lacks a referent. Probability theory has been renamed (simply) probability, and it no longer appears to be the theory of anything. Whether or not this use of language could be considered wrong per se, it hardly seems to be clearing up any philosophical confusion! If I ask “what is this thing that I am computing using Bayes’s theorem?”, the answer is no longer “the posterior probability”—if probability is the new word for the mathematical tools of probability theory, the phrase posterior probability no longer means anything. So perhaps I’ll have to invent a new word to refer to the same thing that the word probability used to refer to.
        
        Do you begin to see why I think this is a waste of time?
        
        NB: I think we’re making much more progress than I made with user:potato. That’s what I mean about the difficulty of having to argue with someone who is inarticulate, i.e. can’t state his case properly.
        thomblake 29 Nov 2011 0:13 UTC
        2 points
        Parent
        
        “probability is the set of mathematical discoveries we have made, which deal with [ ], including such things as Bayes’s theorem”.
        
        Probably better put in terms of being a formal system, rather than “a set of mathematical discoveries”. But I fear that tends towards begging the question!
        
        As for the “formalist” version, firstly it puts the frequentist and Bayesian interpretations on a level footing. Even if sensible people were wasting time and effort arguing specifically over a mere definition, the cost of conceding ground to the problematic frequentist interpretation outweighs any benefit from ending that argument, in comparison to the option of simply carrying on using the language of the Bayesian.
        
        This treatment (notably the use of terms like “conceding ground”) suggests that you are engaging in a “political”/”debate” mode rather than a “truth-seeking” mode. This leads me to believe that we have more to lose by accepting the “Bayesian/Frequentist” duality than by dissolving it entirely and changing our terminology to match. This matches my impression of previous forays into the “Bayesian/Frequentist” ‘holy wars’.
        
        If politics is mind-killing, then it must certainly be avoided even at great cost with respect to our most basic tools of rationality.
        
        Do you begin to see why I think this is a waste of time?
        
        Indeed, though in that case you’ve spent far more time on this than most who exercised the default ‘ignore’ option.
        
        If I ask “what is this thing that I am computing using Bayes’s theorem?”, the answer is no longer “the posterior probability”—if probability is the new word for the mathematical tools of probability theory, the phrase posterior probability no longer means anything. So perhaps I’ll have to invent a new word to refer to the same thing that the word probability used to refer to.
        
        A good point.
        
        That’s what I mean about the difficulty of having to argue with someone who is inarticulate, i.e. can’t state his case properly.
        
        I understood what you meant—I just did not see any inarticulateness on the part of User:potato.
        
        I’ve never felt the urge to argue specifically about whether probability is degrees of belief or is frequency—nor have I ever seen anyone else do so.
        
        I normally see this being explicitly the subject on Bayesian/Frequentist debates, and many long conversations with philosophers have revolved around whether “equating probability with subjective belief” is an “ontological confusion”.
        [deleted] 29 Nov 2011 11:12 UTC
        0 points
        Parent
        
        This treatment (notably the use of terms like “conceding ground”) suggests that you are engaging in a “political”/”debate” mode rather than a “truth-seeking” mode.
        
        Duly noted. I’ll try not to give this impression in future.
        
        I normally see this being explicitly the subject on Bayesian/Frequentist debates, and many long conversations with philosophers have revolved around whether “equating probability with subjective belief” is an “ontological confusion”.
        
        I may have simply failed to notice these arguments taking place. In order to dissolve any such ostensible ontological question, I’d recommend pointing out that to say probability is one or other thing is merely a statement to the effect that one interpretation is preferred for some reason by the writer—since both interpretations satisfy the Cox postulates or Kolmogorov axioms, we could define probability to be either subjective degrees of belief or long-run frequency, and make sound and rational inferences in either case (albeit perhaps not with the same efficiency). This should be enough to persuade an otherwise sensible person that he’s engaged in a futile argument about definitions.
        
        Formalism attempts to solve the problem by effectively tabooing the concept of probability such that it no longer has a definition. Although we might be able to get around the problem that I mentioned by answering the question “”what is this thing that I am computing using Bayes’s theorem?” by saying “the posterior subjective degree of belief” or “the posterior frequency”, it’s easy to see how the same kind of philosophers would end up arguing over whether, in the case of a coin flip for example, we are really talking about prior and posterior subjective degrees of belief, or about prior and posterior long-run frequencies. And we would have lost the use of the word “probability”, which makes our messages shorter than they would otherwise be.
        
        To the extent that there is such a thing as the proper use of words, to delete useful words from our vocabulary in order to (probably unsuccessfully) prevent people from having a definitional argument that could best be dispelled by introducing them to such notions as “dissolving the question” and reductionism isn’t it. On the other hand I’ll give user:potato credit for exposing an issue that may be more problematic than I at first believed.
        
        I expect that we are substantially in agreement at this point.
      - wnoise 28 Nov 2011 23:50 UTC
        0 points
        Parent
        FWIW, I think my three preferred terms are “Probabilities”, “Frequencies”, and “Normed Measure Theory”. That’s what Kolmogorov’s formalization amounts to anyway, and as the OP said it truly need not be connected to either probabilities or frequencies in a given use.
- jsteinhardt 29 Nov 2011 5:22 UTC
  0 points
  Parent
  I don’t understand. Based on reading through the passages you referenced in PtLoS, maximum entropy is a way of choosing a distribution out of a family of distributions (which, by the way, is a frequentist technique, not a Bayesian one). Solomonoff induction is a choice of prior. I don’t really understand in what sense these are related to each other, or in what sense Maxent generates priors at all.
  
  Incidentally I’m surprised that there appears to be so much disagreement about this, given that LW is basically a forum brought into existence on the strength of Yudkowsky’s abilities as a thinker, writer and populariser, and he clearly holds frequentism in contempt.
  
  I’ve always felt that the frequentists that Eliezer argues against are straw men. As I said earlier, I’ve never met a frequentist who is guilty of the accusations that you keep making, although I have met Bayesians whose philosophy interfered with their ability to do good statistical modeling / inference. Have you actually run into the people who you seem to be arguing against? If not, then I think you should restrict yourself to arguing against opinions that people are actually trying to support, although I also think that whether or not some very foolish people happen to be frequentists is irrelevant to the discussion (something Eliezer himself discussed in the “Reversed Stupidity is not Intelligence” post).
  - nshepperd 29 Nov 2011 7:25 UTC
    2 points
    Parent
    If you know nothing about a variable except that it’s in the interval [a, b] your probability distribution must be from the class of distributions where p(x) = 0 for x outside of [a, b]. You pick the distribution of maximal entropy from this class as your prior, to encode ignorance of everything except that x ∈ [a,b].
    
    That is one way Maxent may generate a prior, anyway.
  - Manfred 29 Nov 2011 5:47 UTC
    2 points
    Parent
    
    a way of choosing a distribution out of a family of distributions (which, by the way, is a frequentist technique, not a Bayesian one).
    
    We can call dibs on things now? Ooh, I call dibs on approximating a slowly varying function as a constant!