It seems like at the end of a fairly complicated construction process that if you wind up with a model that outperforms, your prior should be that you managed to sneak in overfitting without realizing it rather than that you actually have an edge right? Even if, say, you wound up with something that seemed safe because it had low variance in the short run, you’d suspect that you had managed to push the variance out into the tails. How would you determine how much testing would be needed before you were confident placing bets of appreciable size? I’m guessing there’s stuff related to structuring your stop losses here I don’t know about.
Yes, avoiding overfitting is the key problem, and you should expect almost anything to be overfit by default. We spend a lot of time on this (I work w/Alexei). I’m thinking of writing a longer post on preventing overfitting, but these are some key parts:
Theory. Something that makes economic sense, or has worked in other markets, is more likely to work here
Components. A strategy made of 4 components, each of which can be independently validated, is a lot more likely to keep working than one black box
Measuring strategy complexity. If you explore 1,000 possible parameter combinations, that’s less likely to work than if you explore 10.
Algorithmic decision making. Any manual part of the process introduces a lot of possibilities for overfit.
Abstraction & reuse. The more you reuse things, the fewer degrees of freedom you have with each idea, and therefore the lower your chance of overfitting.
As an example, consider a strategy like “on Wednesdays, the market is more likely to have a large move, and signal XYZ predicts big moves accurately.” You can encode that as an algorithm: trade signal XYZ on Wednesdays. But the algorithm might make money on backtests even if the assumptions are wrong! By examining the individual components rather than just whether the algorithm made money, we get a better idea of whether the strategy works.
Is this an instance of the “theory” bullet point then? Because the probability of the statement “trading signal XYZ works on Wednesdays, because [specific reason]” cannot be higher than the probability of the statement “trading signal XYZ works” (the first statement involves a conjunction).
It’s a combination. The point is to throw out algorithms/parameters that do well on backtests when the assumptions are violated, because those are much more likely to be overfit.
Yes to everything Satvik said, plus: it helps if you’ve tested the algorithm across multiple different market conditions. E.g. in this case we’ve looked at 2017 and 2018 and 2019, each having a pretty different market regime. (For other assets you might have 10+ years of data, which makes it easier to be more confident in your findings since there are more crashes + weird market regimes + underlying assumptions changing.)
But you’re also getting at an important point I was hinting at in my homework question:
We’re predicting up bars, but what we ultimately want is returns. What assumptions are we making? What should we consider instead?
Basically, it’s possible that we predict the sign of the bar with a 99% accuracy, but still lose money. This would happen if every time we get the prediction right the price movement is relatively small, but every time we get it wrong, the price moves a lot and we lose money.
Stop losses can help. Another way to mitigate this is to run a lot of uncorrelated strategies. Then even if the market conditions becomes particularly adversarial for one of your algorithms, you won’t lose too much money because other algorithms will continue to perform well: https://www.youtube.com/watch?v=Nu4lHaSh7D4
That sounds equivalent to kelly criterion, that most of your bankroll is in a low variance strategy and some proportion of your bankroll is spread across strategies with varying amounts of higher variance. Is there any existing work on kelly optimization over distributions rather than points?
edit: full kelly allows you to get up to 6 outcomes before you’re in 5th degree polynomial land which is no fun. So I guess you need to choose your points well. http://www.elem.com/~btilly/kelly-criterion/
It seems like at the end of a fairly complicated construction process that if you wind up with a model that outperforms, your prior should be that you managed to sneak in overfitting without realizing it rather than that you actually have an edge right? Even if, say, you wound up with something that seemed safe because it had low variance in the short run, you’d suspect that you had managed to push the variance out into the tails. How would you determine how much testing would be needed before you were confident placing bets of appreciable size? I’m guessing there’s stuff related to structuring your stop losses here I don’t know about.
Yes, avoiding overfitting is the key problem, and you should expect almost anything to be overfit by default. We spend a lot of time on this (I work w/Alexei). I’m thinking of writing a longer post on preventing overfitting, but these are some key parts:
Theory. Something that makes economic sense, or has worked in other markets, is more likely to work here
Components. A strategy made of 4 components, each of which can be independently validated, is a lot more likely to keep working than one black box
Measuring strategy complexity. If you explore 1,000 possible parameter combinations, that’s less likely to work than if you explore 10.
Algorithmic decision making. Any manual part of the process introduces a lot of possibilities for overfit.
Abstraction & reuse. The more you reuse things, the fewer degrees of freedom you have with each idea, and therefore the lower your chance of overfitting.
I’d be interested to learn more about the “components” part.
As an example, consider a strategy like “on Wednesdays, the market is more likely to have a large move, and signal XYZ predicts big moves accurately.” You can encode that as an algorithm: trade signal XYZ on Wednesdays. But the algorithm might make money on backtests even if the assumptions are wrong! By examining the individual components rather than just whether the algorithm made money, we get a better idea of whether the strategy works.
Is this an instance of the “theory” bullet point then? Because the probability of the statement “trading signal XYZ works on Wednesdays, because [specific reason]” cannot be higher than the probability of the statement “trading signal XYZ works” (the first statement involves a conjunction).
It’s a combination. The point is to throw out algorithms/parameters that do well on backtests when the assumptions are violated, because those are much more likely to be overfit.
Yes to everything Satvik said, plus: it helps if you’ve tested the algorithm across multiple different market conditions. E.g. in this case we’ve looked at 2017 and 2018 and 2019, each having a pretty different market regime. (For other assets you might have 10+ years of data, which makes it easier to be more confident in your findings since there are more crashes + weird market regimes + underlying assumptions changing.)
But you’re also getting at an important point I was hinting at in my homework question:
Basically, it’s possible that we predict the sign of the bar with a 99% accuracy, but still lose money. This would happen if every time we get the prediction right the price movement is relatively small, but every time we get it wrong, the price moves a lot and we lose money.
Stop losses can help. Another way to mitigate this is to run a lot of uncorrelated strategies. Then even if the market conditions becomes particularly adversarial for one of your algorithms, you won’t lose too much money because other algorithms will continue to perform well: https://www.youtube.com/watch?v=Nu4lHaSh7D4
That sounds equivalent to kelly criterion, that most of your bankroll is in a low variance strategy and some proportion of your bankroll is spread across strategies with varying amounts of higher variance. Is there any existing work on kelly optimization over distributions rather than points?
edit: full kelly allows you to get up to 6 outcomes before you’re in 5th degree polynomial land which is no fun. So I guess you need to choose your points well. http://www.elem.com/~btilly/kelly-criterion/
Good question. I don’t know.