Thermodynamics of learning. As we saw, the only way to obtain more efficient bounds was to introduce restrictions to the target function class. As we will see in the next post, to obtain stronger generalization bounds, we will need to break apart the model class in a similar way. In both cases, the classical approach attempts to the study the relevant phenomenon in too much generality, which incurs no-free-lunch-y effects that prevent you from obtaining strong guarantees.
But by breaking these classes down into more manageable subclasses, analogous to how thermodynamics breaks down the phase space into macrostates, we approach much stronger guarantees. As we’ll find out in the rest of this sequence, the future of learning theory is physics.
This is a very interesting point.
Though can you elaborate on “incurs no-free-lunch-y effects that prevent you from obtaining strong guarantees”? I can’t quite parse the meaning.
The No Free Lunch Theorem says “that any two optimization algorithms are equivalent when their performance is averaged across all possible problems.”
So if the class of target functions (=the set of possible problems you would want to solve) is very large, then it’s harder for a random model class (=set of solutions) to do much better than any other model class. You can’t obtain strong guarantees for why you should expect good approximation.
If the target function class is smaller and your model class is big enough you might have better luck.
This is a very interesting point.
Though can you elaborate on “incurs no-free-lunch-y effects that prevent you from obtaining strong guarantees”? I can’t quite parse the meaning.
The No Free Lunch Theorem says “that any two optimization algorithms are equivalent when their performance is averaged across all possible problems.”
So if the class of target functions (=the set of possible problems you would want to solve) is very large, then it’s harder for a random model class (=set of solutions) to do much better than any other model class. You can’t obtain strong guarantees for why you should expect good approximation.
If the target function class is smaller and your model class is big enough you might have better luck.