Regarding myth 5 and the online learning, I don’t think the average regret bound is as awesome as you claim. The bound is square root( (log n) / T). But if there are really no structural assumptions, then you should be considering all possible models, and the number of possible models for T rounds is exponential in T, so the bound ends up being 1, which is the worst possible average regret using any strategy. With no assumptions of structure, there is no meaningful guarantee on the real accuracy of the method.
The thing that is awesome about the bounds guarantee is that if you assume some structure, and choose a subset of possible models based on that structure, you know you get increased accuracy if your structural assumptions hold.
So this method doesn’t really avoid relying on structural assumptions, it just punts the question of which structural assumption to make to the choice of models to run the method over. This is pretty much the same as Bayesian methods putting the structural assumptions in the prior, and it seems that choosing a collection of models is an approximation of choosing a prior, though less powerful because instead of assigning models probabilities in a continuous range, it just either includes the model or doesn’t.
and the number of possible models for T rounds is exponential in T
??? Here n is the number of other people betting. It’s a constant.
If you wanted to, you could create “super-people” that mix and match the bets of other people depending on the round. Then the number of super-people grows exponentially in T, and without further assumptions you can’t hope to be competitive with such “super-people”. If that’s what you’re saying, then I agree with that.
And I agree with the broader point that in general you need to make structural assumptions to make progress. The thing that’s awesome about the regret bound is that it does well even in the presence of correlated, non-i.i.d., maybe even adversarial data, and even if the “true hypothesis” isn’t in the family of models we consider.
and the number of possible models for T rounds is exponential in T
??? Here n is the number of other people betting. It’s a constant.
Within a single application of online learning, n is a constant, but that doesn’t mean we can’t look at the consequences of it having particular values, even values that vary with other parameters. But, you seem to be agreeing with the main points that if you use all possible models (or “super-people”) the regret bound is meaningless, and that in order to reduce the number of models so it is not meaningless, while also keeping a good model that is worth performing almost as well as, you need structural assumptions.
even if the “true hypothesis” isn’t in the family of models we consider
I agree you don’t need the model that is right every round, but you do need the model to be right in a lot of rounds. You don’t need a perfect model, but you need a model that is as correct as you want your end results to be.
maybe even adversarial data
I think truly adversarial data gives a result that is within the regret bounds, as guaranteed, but still uselessly inaccurate because the data is adversarial against the collection of models (unless the collection is so large you aren’t really bounding regret).
Regarding myth 5 and the online learning, I don’t think the average regret bound is as awesome as you claim. The bound is square root( (log n) / T). But if there are really no structural assumptions, then you should be considering all possible models, and the number of possible models for T rounds is exponential in T, so the bound ends up being 1, which is the worst possible average regret using any strategy. With no assumptions of structure, there is no meaningful guarantee on the real accuracy of the method.
The thing that is awesome about the bounds guarantee is that if you assume some structure, and choose a subset of possible models based on that structure, you know you get increased accuracy if your structural assumptions hold.
So this method doesn’t really avoid relying on structural assumptions, it just punts the question of which structural assumption to make to the choice of models to run the method over. This is pretty much the same as Bayesian methods putting the structural assumptions in the prior, and it seems that choosing a collection of models is an approximation of choosing a prior, though less powerful because instead of assigning models probabilities in a continuous range, it just either includes the model or doesn’t.
??? Here n is the number of other people betting. It’s a constant.
If you wanted to, you could create “super-people” that mix and match the bets of other people depending on the round. Then the number of super-people grows exponentially in T, and without further assumptions you can’t hope to be competitive with such “super-people”. If that’s what you’re saying, then I agree with that.
And I agree with the broader point that in general you need to make structural assumptions to make progress. The thing that’s awesome about the regret bound is that it does well even in the presence of correlated, non-i.i.d., maybe even adversarial data, and even if the “true hypothesis” isn’t in the family of models we consider.
Within a single application of online learning, n is a constant, but that doesn’t mean we can’t look at the consequences of it having particular values, even values that vary with other parameters. But, you seem to be agreeing with the main points that if you use all possible models (or “super-people”) the regret bound is meaningless, and that in order to reduce the number of models so it is not meaningless, while also keeping a good model that is worth performing almost as well as, you need structural assumptions.
I agree you don’t need the model that is right every round, but you do need the model to be right in a lot of rounds. You don’t need a perfect model, but you need a model that is as correct as you want your end results to be.
I think truly adversarial data gives a result that is within the regret bounds, as guaranteed, but still uselessly inaccurate because the data is adversarial against the collection of models (unless the collection is so large you aren’t really bounding regret).