Ensemble learning methods are a challenge for “prevents overfitting” justifications of Occam’s razor since they propose weirdly complex hypotheses, but suffer less from overfitting than the weak classifiers that they are built from.
It’s not a challenge unless you think that each of the weak classifiers should be powerful enough to do the classification correctly. Occam = “Do not multiply entities beyond need”, not “Do not multiply entities”.
If the scientist, confronted with a long run of “no new particles found” does switch to the known number of particles, then Nature has extracted a mind change from the scientist without a corresponding particle. The Occam strategy achieves the fewest possible number of mind changes (that is, equal to the number of particles), given such an adversarial Nature.
The Occam strategy isn’t to say the number of particles that exist equals the number observed. It’s to use the simplest underlying theory. We have theories with fewer moving parts than number of particles predicted. Maybe you mean “fundamental concepts” when you say “particles”; but in that case, the non-Occam strategy is a straw man, since I don’t think anyone proposes new concepts that provide no additional explanatory power.
Can you elaborate your sentence that begins “It’s not a challenge..”?
My understanding is that if our real justification for “Why do we use Occam’s razor?” was “Because that way we avoid overfitting.” then if a future statistical technique that outperformed Occam by proposing weirdly complex hypotheses came along, we would embrace it wholeheartedly.
Boosting and bagging are merely illustrative of the idea that there might be a statistical technique that achieves good performance in a statistical sense, though nobody believes that their outputs (large ensembles of rules) are “really out there”, the way we might believe Schroedinger’s equation is “really out there”.
We only use boosting if our set of low-complexity hypotheses does not contain the solution we need. And instead of switching to a larger set of still-low-complexity hypotheses, we do something much cheaper, a second best thing: we try to find a good hypothesis in the convex hull of the original hypothesis space.
In short, boosting “outperforms Occam” only in man-hours saved: boosting requires less thinking and less work than properly applying Occam’s razor. That really is a good thing, of course.
Nice post! Couple quibbles/questions.
It’s not a challenge unless you think that each of the weak classifiers should be powerful enough to do the classification correctly. Occam = “Do not multiply entities beyond need”, not “Do not multiply entities”.
The Occam strategy isn’t to say the number of particles that exist equals the number observed. It’s to use the simplest underlying theory. We have theories with fewer moving parts than number of particles predicted. Maybe you mean “fundamental concepts” when you say “particles”; but in that case, the non-Occam strategy is a straw man, since I don’t think anyone proposes new concepts that provide no additional explanatory power.
Can you elaborate your sentence that begins “It’s not a challenge..”?
My understanding is that if our real justification for “Why do we use Occam’s razor?” was “Because that way we avoid overfitting.” then if a future statistical technique that outperformed Occam by proposing weirdly complex hypotheses came along, we would embrace it wholeheartedly.
Boosting and bagging are merely illustrative of the idea that there might be a statistical technique that achieves good performance in a statistical sense, though nobody believes that their outputs (large ensembles of rules) are “really out there”, the way we might believe Schroedinger’s equation is “really out there”.
We only use boosting if our set of low-complexity hypotheses does not contain the solution we need. And instead of switching to a larger set of still-low-complexity hypotheses, we do something much cheaper, a second best thing: we try to find a good hypothesis in the convex hull of the original hypothesis space.
In short, boosting “outperforms Occam” only in man-hours saved: boosting requires less thinking and less work than properly applying Occam’s razor. That really is a good thing, of course.