gwern comments on Probability, knowledge, and meta-probability

gwern 14 Sep 2013 22:21 UTC
5 points
So perhaps this is for the next post, but are these ‘metaprobabilities’ just regular hyperparameters?
- lucidian 15 Sep 2013 23:47 UTC
  4 points
  Parent
  I was wondering this too. I haven’t looked at this A_p distribution yet (nor have I read all the comments here), but having distributions over distributions is, like, the core of Bayesian methods in machine learning. You don’t just keep a single estimate of the probability; you keep a distribution over possible probabilities, exactly like David is saying. I don’t even know how updating your probability distribution in light of new evidence (aka a “Bayesian update”) would work without this.
  
  Am I missing something about David’s post? I did go through it rather quickly.
- David_Chapman 14 Sep 2013 23:13 UTC
  2 points
  Parent
  I’m sure you know more about this than I do! Based on a quick Wiki check, I suspect that formally the A_p are one type of hyperprior, but not all hyperpriors are A_p (a/k/a metaprobabilities).
  
  Hyperparameters are used in Bayesian sensitivity analysis, a/k/a “Robust Bayesian Analysis”, which I recently accidentally reinvented here. I might write more about that later in this sequence.
  - Vaniver 14 Sep 2013 23:24 UTC
    8 points
    Parent
    When you use an underscore in a name, make sure to escape it first, like so:
    
    I suspect that formally the A\_p are one type of [hyperprior](http://en.wikipedia.org/wiki/Hyperprior), but not all hyperpriors are A\_p (a/k/a metaprobabilities).
    (This is necessary because underscores are yet another way to make things italic, and only applies to comments, as posts use different formatting.)
    - David_Chapman 15 Sep 2013 2:35 UTC
      2 points
      Parent
      Thanks! Fixed.
  - alex_zag_al 17 Sep 2013 0:54 UTC
    0 points
    Parent
    Yeah—from what I’ve seen, something mathematically equivalent to A_p distributions are commonly used, but that’s not what they’re called.
    
    Like, I think you might call the case in this problem “a Bernoulli random variable with an unknown parameter”. (The Bernoulli random variable being 1 if it gives you $2, 0 if it gives you $0). And then the hyperprior would be the probability distribution of that parameter, I guess? I haven’t really heard that word before.
    
    ET Jaynes, of course, would never talk like this because the idea of a random quantity existing in the real world is a mind projection fallacy. Thus, no “random variables”. So he uses the A_p distribution as a way of thinking about the same math without the idea of randomness. Jaynes’s A_p in this case corresponds exactly to the more traditional “the parameter of the Bernoulli random variable is p”.
    
    (btw I have a purely mathematical question about the A_p distribution chapter, which I posted to the open thread: http://lesswrong.com/lw/ii6/open_thread_september_28_2013/9pbn if you know the answer I’d really appreciate it if you told me)