othercriteria comments on Teaching Bayesianism

othercriteria 13 Jun 2012 20:51 UTC
0 points

If you are given steel die to physically experiment with, there again are a lot better (faster) ways to find out the probabilities, than just tossing (do you even understand that your errors converge as 1/sqrt(N) , or how important of an issue is that in practice?!).

The world often isn’t nice enough to give us the steel die. Figuratively, the steel die may be inside someone’s skull, thousands of years in the past, millions of light-years away, or you may have five slightly different dice and really want to learn about the properties of all dice.

I do understand the O(N^(-1/2)) convergence of errors. I spend a lot of time working on problems where even consistency isn’t guaranteed (i.e., nonparametric problems where the “number of parameters” grows in some sense with the amount of data) and finding estimators with such convergence properties would be great there.

‘Probability of model given the data’ is not well defined, unless you count stuff like ‘Solomonoff induction as a prior’, where it is defined but not computable (and is mathematically homologous to assigning probability of 1 to the ‘we live inside Turing machine’ model).

It’s perfectly well-defined. It’s just subjective in a way that makes you (and a great number of informed, capable, and thoughtful statisticians) apparently very uneasy. There’s some theory that gives pretty general conditions under which Bayesian procedures converge to the true answer, in spite of choice of prior, given enough data. You probably wouldn’t be happy with rates of convergence for these methods, because they tend to be slower and harder to obtain than for, e.g., MLE estimation of iid normally-distributed data.

The experimental physicists publish probability of data given model; people can then combine that with their priors if they want.

They might well do this. As a frequentist, this is a natural step in establishing confidence intervals and such, after they have estimated the quantity of interest by choosing the model that maximizes the probability of the data. This choice may not look like “Standard Model versus something else” but it probably looks like “semi-empirical model of the system with parameter 1 = X” where X can range over some reasonable interval.

unless you count stuff like ‘Solomonoff induction as a prior’

I don’t see what role Solomonoff induction plays in a discussion of frequentism versus Bayesianism. I never mentioned it, I don’t know enough about it to use it, and I agree with you that it shows up on LW more as a mantra than as an actual tool.
- private_messaging 14 Jun 2012 7:32 UTC
  0 points
  Parent
  
  The world often isn’t nice enough to give us the steel die.
  
  The point is that the probability with die comes in as frequency (the fraction of initial phase space). Yes, sometimes nature doesn’t give you die; that does not invalidate the fact that there exists probability as objective property of a physical process, as per frequentism (related to how the process maps initial phase space to final phase space); the methods employing subjectivity have to try to conform to this objective property as closely as possible (e.g. by trying to know more about how the system works). The Bayesianism is not opposed to this, unless we are to speak of some terribly broken Bayesianism.
  
  ‘Probability of model given the data’ is not well defined,
  
  It’s perfectly well-defined.
  
  Nope. Only the change to probability of model given the data is well defined. The probability itself isn’t. You can pick arbitrary start point.
  
  There’s some theory that gives pretty general conditions under which Bayesian procedures converge to the true answer,
  
  The notion of ‘true answer’ is frequentist....
  
  edit: Recall that the original argument was about the trope of Bayesianism being opposed to frequentism etc. here. The point with Solomonoff induction is that once you declare something like this a source of priors, all math youll be doing should be completely identical to frequentist math (when frequencies are within turing machines fed random tape, and the math is done as in my top level post for die), just as long as you don’t simply screw your math up. The point with die example was that no Bayesianist worth their salt opposes to there being a property of chaotic process, what fraction of initial phase space gets mapped to where, because there really is this property.