gwern comments on Less Wrong Book Club and Study Group

gwern 10 Jun 2010 20:36 UTC
2 points
In particular, I think Maximum entropy methods are not Bayesian in the sense that they do not follow from the Cox-Polya desiderata.

IIRC, this was my understanding of Jaynes’s position on maxent:
1. the Cox-Polya desiderata say that multiple allowed derivations of a problem ought to all lead to the same answer
2. if we consider a list of identifiers about which we know nothing, and we ask whether the first one is more likely than the nth one, then we should answer that they are equal, because if we say either greater than or less than, we could shuffle the list and get a contradictory answer. By induction, we ought to say that all members of the list are equiprobable, which only allows entries to be 1/n probable.
3. hence, we get the Principle of Indifference. (Points 1-3 are my version of chapter 2 or 3, IIRC.)
4. Maxent is just the same idea, abstract and applied to non-list thingies. (I haven’t actually gotten this far, but it seems like the obvious next step.)
The arguments seem to me to be as Bayesian as anything in his building up of Bayesian methods from the Cox-Polya criteria.
- taiyo 10 Jun 2010 21:56 UTC
  0 points
  Parent
  I think this is not so important, but it helpful to think about nonetheless. I guess the first step is to define what is meant by ‘Bayesian’. In my original comment, I took one necessary condition to be that a Bayesian gadget is one which follows from the Cox-Polya desiderata. It might be better to define it to be one which uses Bayes’ Theorem. I think in either case, Maxent fails to meet the criteria.
  
  Maxent produces the distribution on the sample space which maximizes entropy subject to any known constraints which presumably come from data. If there are no constraints, then one gets the principle of indifference which can also be gotten straight out of the Cox-Polya desiderata as you say. But I think these are two different approaches to the same target. Maxent needs something new—namely Shannon’s information entropy (by ‘new’ I mean new w.r.t. Cox-Polya). Furthermore, the derivation of Maxent is really different from the derivation of the principle of indifference from Cox-Polya.
  
  I could be completely off here, but I believe the principle of indifference argument is generalized by the transformation group stuff. I think this because I can see the action of the symmetric group (this is the group (group in the abstract algebra sense) of permutations) on the hypothesis space in the principle of indifference stuff. Anyway, hopefully we’ll get up to that chapter!
  - taiyo 13 Jun 2010 1:10 UTC
    1 point
    Parent
    Upon further study, I disagree with myself here. It does seem like entropy as a measurement of uncertainty in probability distributions does more or less fall out of the Cox Polya desiderata. I guess that ‘common sense’ one is pretty useful!