Why quantitative finance is so hard
This is not financial advice.
Quantitative finance (QF) is the art of using mathematics to extract money from a securities market. A security is a fungible financial asset. Securities include stocks, bonds, futures, currencies, cryptocurrencies and so on. People often use the techniques of QF to extract money from prediction markets too, particularly sports betting pools.
Expected return is future outcomes weighted by probability. A trade has edge if its expected return is positive. You should never make a trade with negative expected return. It is not enough just to use expected return. Most peoples’ value functions curve downward. The marginal value of money decreases the more you have. Most people have approximately logarithmic value functions.
A logarithmic curve is approximately linear when you zoom in. Losing 1% of your net worth hurts you slightly more than earning 1% of your net worth helps you. But the difference is usually small enough to ignore. The difference between earning 99% of your net worth and losing 99% of your net worth is not ignorable.
When you gain or lose 1% of your net worth, the expected change to the logarithm of your wealth is a tiny −0.01%. When you gain or lose 99% of your net worth the expected change to the logarithm of your wealth is −400%.
This is called a risk premium. For every positive edge you can use the Kelly criterion to calculate a bet small enough such that the you edge exceeds your risk premium. In practice traders tend to use fractional Kelly.
Minimum transaction costs are often constant. It is not sufficient for your edge to merely exceed your risk premium. It must exceed your risk premium plus the transaction cost. Risk premium is defined as a fraction of your net worth but transaction costs are often constant. If you have lots of money then you can place larger bets while keeping your risk premium constant. This is one of the reasons hedge funds like having large war chests. Larger funds can harvest risk-adjusted returns from smaller edges.
Getting an Edge
The only free lunch in finance is diversification. If you invest in two uncorrelated assets with equal edge then your risk goes down. This is the principle behind index funds. If you know you’re going to pick stocks with the skill of a monkey then you might as well maximize diversification by picking all the stocks. As world markets become more interconnected they become more correlated too. The more people invest in index funds, the less risk-adjusted return diversification buys you. Nevertheless, standard investment advice for most[1] people is to invest in bonds and index funds. FEMA recommends you add food and water.
All of the above is baseline. Baseline rents you can extract by mindlessly owning the means of production is called beta . Earning money in excess of beta by beating the market is called alpha .
There are three ways to make a living in this business: be first, be smarter or cheat.
―John Tuld in Margin Call
You can be first by being fast or using alternative data. Spread Networks laid a $300 million fiber optic cable in close to a straight line from New York City to Chicago. Being fast is expensive. If you use your own satellites to predict crop prices then you can beat the market. Alternative data is expensive too.
If you want to cheat go listen to Darknet Diaries. Prison is expensive.
Being smart is cheap.
Science will not save you
Science [ideal] applies Occam’s Razor to distinguish good theories from bad. Science [experimental] is the process of shooting a firehose of facts at hypotheses until only the most robust survive. Science [human institution] works when you have lots of new data coming in. If the data dries up then science [human institution] stops working. Lee Smolin asserts this has happened to theoretical physics.
If you have two competing hypotheses with equal prior probability then you need one bit of entropy to determine which one is true. If you have four competing hypotheses with equal prior probability then you need two bits of entropy to determine which one is true. I call your prior probability weighted set of competing hypotheses a hypothesis space. To determine which hypothesis in the hypothesis space is true you need training data. The entropy of your training data must exceed the entropy of your hypothesis space.
The entropy of competing hypotheses with equal prior probability is . Suppose your training dataset has entropy . The number of competing hypotheses you can handle grows exponentially as a function of .
The above equation only works if all the variables in each hypothesis are hard-coded. A hypothesis counts as a separate hypothesis from .
A hypothesis can instead use tunable parameters. Tunable parameters eat up the entropy of our training data fast. You can measure the entropy of a hypothesis by counting how many tunable parameters it has. A one-dimensional linear model has two tunable parameters. A one-dimensional quadratic model has three tunable parameters. A one-dimensional cubic model has four tunable parameters. Suppose each tunable parameter has bits of entropy. The total entropy needed to collapse a hypothesis space with tunable parameters equals . The entropy of a hypothesis space with tunable parameters equals .
We can combine these equations. Suppose your hypothesis space has separate hypotheses each with tunable parameters. The total entropy equals the entropy necessary to distinguish hypotheses from each other plus the entropy necessary to tune a hypothesis’s parameters.
Logarithmic functions grow slower than linear functions. The number of hypotheses is inside the logarithm. The number of tunable parameters is outside of it. The entropy of our hypothesis space is dominated by . The number of competing hypotheses we can distinguish grows exponentially slower than the entropy of our training data. You can distinguish competing hypotheses from each other by throwing training data at a problem if they have few tunable parameters. If you have tunable parameters then the entropy required to collapse your hypothesis space goes up fast.
If you have lots of entropy in your training data then you can train a high-parameter model. Silicon Valley gets away with using high-parameter models to run its self-driving cars and image classifiers because it is easy to create new data. There is so much data available that Silicon Valley data scientists focus their attention on compute efficiency.
Wall Street is the opposite. Quants are bottlenecked by training data entropy.
Past performance is not indicative of future results
If you are testing a drug, training a self driving car or classifying images then past performance tends to be indicative of future results. If you are examining financial data then past performance is not indicative of future results. Consider a financial bubble. The price of tulips goes up. It goes up some more. It keeps going up. Past performance indicates the price ought to keep going up. Yet buying into a bubble has negative expected return.
Wikipedia lists 25 economic crises in the 20th century plus 20 in the 21st century to date for a total of 45. Financial crises matter. Hedge funds tend to be highly leveraged. A single crisis can wipe out a firm. If a strategy cannot ride out financial crises then it is unviable. Learning from your mistakes does not work if you do not survive your mistakes.
When Tesla needs more training data to train its self-training cars they can drive more cars around. If a hedge fund needs 45 more financial crises to train its model then they have to wait a century. World conditions change. Competing actors respond to the historical data. New variables appear faster than new training data. You cannot predict financial crises just by waiting for more training data because the entropy of your hypothesis space outraces the entropy of your training data.
You cannot predict a once-in-history event by applying a high-parameter model to historical data alone.
- ↩︎
If your government subsidizes mortgages or another kind of investment then you may be able to beat the market.
Small nitpicks:
No!
You explain why in your post, but let me spell it out more explicitly. Diversification means that adding a negative expected return trade to a portfolio can INCREASE the return by adding a negatively returning, negatively correlated asset. Lets say we have two assets: “market” and “insurance”. Market returns 11%/year 9⁄10 years, down 50% the other year. Insurance returns −3^%/year 9⁄10 and up 22% the other year. Expected market returns are: 5%/2.6% (simple mean / compounded), insurance are: −0.5%/-.7% (mean / compounded). By your logic you should never buy the insurance, and yet if we have a portfolio which maintains a 15% allocation to our insurance asset our expected (compounded) returns increase.
Here is a concrete real-world example: (60⁄40 + tail hedge).
Another way to reason about this is: there’s nothing special about zero nominal returns. So if you shouldn’t make a trade with negative expected return, you should be able to say the same thing about ~any return and by extension you should only put your $ in the highest returning asset… but that misses the whole value of diversification!
(Emphasis mine) This is a strong claim, which I would dispute. Risk premiums (not in the sense you’ve used the word, but in the sense I understand it to mean) are an obvious example—some assets have a positive yield just for holding them, even after accounting for volatility… Leverage would be another example.
Kinda, sorta, maybe. It’s “a” principle behind them, but if diversification were the only concern, why would you want cap-weighted index funds? Why not equal-sector weight or equal-company weights or some other weights?
I can only assume you’ve never attempted to hire people to work for a quantitative hedge fund. If anything, this claim would undercut your main claim that QF is “so hard”. Unfortunately (or fortunately for your thesis) being smart is really expensive.
Possible takeaways from the “hypothesis space” principle, if I’m understanding it correctly:
Keep m as small as you can. Trading strategies should be as simple as reasonably possible, with just a few simple rules. The more complex you make it, the more likely you’re deceiving yourself by over-fitting to noise. More rules are more ways to mess up.
Most big-data techniques choke on this much noise. You can’t just throw a machine-learning algorithm at market data and expect it to find much. (The first time I tried this, the strategy it came up with was “HODL!”)
Make n as big as you can. Gather as much data as you can. It can help to trade ensembles of securities if they show a similar inefficiency in their price data. You can profit from an edge like this even if you don’t know exactly where it is. (I’m currently trading a forex strategy this way.)
When backtesting a strategy, you’re trying to prove your hypothesis wrong, to see if perturbations make the “edge” disappear, not trying to optimize on past data. Any monkey can overfit to noise and make a backtest look good.
Pairs, baskets or index funds seem to be easier to trade profitably than their individual components.
Yes. This bit is counterintuitive—especially after we have established that you want to keep m as small as possible. By making n as big as you can you are riding the profitable end of a logarithmic curve. By increasing m you are crashing into a combinatorial explosion.
Very true and often a big surprise to people. This is one reason people focus on high frequency trades—more data.
> diversification … free lunch
Mostly true, but at the risk of pedantry it is not quite free. It is quite hard to find 10 good trades and far harder to find 100. Diversification can dilute alpha.
> World conditions change. Competing actors respond to the historical data.
Imagine how hard physics would be if the laws of physics changed whenever you got close the theory of everything. Or if mostly when theories were published they stopped working.
The title is “Why quantitative finance is so hard” but it misses the main reason why quantitative finance is hard:
The competition is brutal.