Lumifer comments on Approximating Solomonoff Induction

Lumifer 2 Jun 2015 2:48 UTC
0 points

It’s true that it’s only possible to find local optima, but that’s true with any algorithm.

Whaaaat? Exhaustive search is an algorithm, it will find you the global optimum anywhere. For many structures of the search space it’s not hard to find the global optimum with appropriate algorithms.

Everything approximates Bayesian inference, it’s just a matter of how ideal the approximation is. If you have enough data, the maximum liklihood approaches bayesian inference.

Huh?
- Houshalter 2 Jun 2015 7:33 UTC
  0 points
  Parent
  In that sense stochastic gradient descent will also find the global optimum, since the randomness will eventually push it to every point possible. It will just take the eternity of the universe, but so will exhaustive search.
  
  It’s also trivial to modify any local algorithm to be global, by occasionally moving around randomly. This is also effective in practice, at finding better local optima.
  
  Everything approximates Bayesian inference, it’s just a matter of how ideal the approximation is. If you have enough data, the maximum likelihood approaches bayesian inference.
  
  Huh?
  
  There is a view that everything that works must be an approximation of the ideal Bayesian method. This is argued by Yudkowsky in Beautiful Probability and Searching for Bayes-Structure.
  
  I used Maximum likelihood as an example, which is where you take the most probable hypothesis (parameters in a statistical model.) Instead of weighing many hypotheses the bayesian way. If you have enough data, the most probable hypothesis should converge to the correct one.
  - Lumifer 2 Jun 2015 16:09 UTC
    1 point
    Parent
    
    There is a view that everything that works must be an approximation of the ideal Bayesian method.
    
    You can reformulate many problems in the Bayesian framework. This does not mean that everything is an approximation of Bayesianism—just like the ability to translate a novel into French does not mean that each novel is an approximation of a French roman.
    - Houshalter 3 Jun 2015 4:28 UTC
      −1 points
      Parent
      It’s deeper than that. Bayesian probability theory is a mathematical law. Anything method that works must be computing an approximation of it. Just like Newtonian mechanics is a very close approximation of relativity. But they are not equivalent.
      - Lumifer 3 Jun 2015 14:47 UTC
        1 point
        Parent
        
        Bayesian probability theory is a mathematical law.
        
        That is not true. The Bayes equation is mathematically correct. A theory is much wider—for example, Bayesians interpret probability as a degree of belief—is that also a mathematical law? You need a prior to start—what does the “mathematical law” say about priors?
  - anon85 2 Jun 2015 9:07 UTC
    0 points
    Parent
    Tell me, did Eliezer even address PAC learning in his writing? If not, I would say that he’s being over-confident and ignorant in stating that Bayesian probability is all there is and everything else is a mere approximation.
    - Manfred 3 Jun 2015 2:39 UTC
      0 points
      Parent
      PAC-learning is definitely something we don’t talk about enough around here, but I don’t see what the conflict is with it being an approximation of Bayesian updating.
      
      Here’s how I see it: You’re updating (approximately) over a limited space of hypotheses that might not contain the true hypothesis, and then this idea that the best model in your space can still be approximately correct is expressible both on Bayesian and on frequentiist grounds (the approximate update over models being equivalent to an approximate update over predictions when you expect the universe to be modelable, and also the best model having a good frequency of success over the long run if the real universe is drawn from a sufficiently nice distribution).
      
      But I’m definitely a n00b at this stuff, so if you have other ideas (and reading recommendations) I’d be happy to hear them.
      - anon85 3 Jun 2015 3:29 UTC
        0 points
        Parent
        
        Here’s how I see it: You’re updating (approximately) over a limited space of hypotheses that might not contain the true hypothesis, and then this idea that the best model in your space can still be approximately correct is expressible both on Bayesian and on frequentiist grounds (the approximate update over models being equivalent to an approximate update over predictions when you expect the universe to be modelable, and also the best model having a good frequency of success over the long run if the real universe is drawn from a sufficiently nice distribution).
        
        The “update” doesn’t use Bayes’s rule; there’s no prior; there’s no concept of belief. Why should we still consider it Bayesian? I mean, if you consider any learning to be an approximation of Bayesian updating, then sure, PAC-learning qualifies. But that begs the question, doesn’t it?
- ThisSpaceAvailable 2 Jun 2015 4:00 UTC
  0 points
  Parent
  You can’t do an exhaustive search on an infinite set.
  - Lumifer 2 Jun 2015 4:57 UTC
    1 point
    Parent
    
    You can’t do an exhaustive search on an infinite set.
    
    I haven’t seen any infinite sets in reality.
    - ThisSpaceAvailable 4 Jun 2015 22:30 UTC
      −4 points
      Parent
      The set of possible Turing Machines is infinite. Whether you consider that to satisfy your personal definition of “seen” or “in reality” isn’t really relevant.