Crossposted from the Metaculus Journal with minor modifications. The original is available here, and you can forecast on the questions cited in the essay yourself after signing up.
From 1870 to 2015, stocks in the United States have returned around 8.4% per year in real (inflation adjusted) terms, while short-term government bonds (called Treasury bills in the US) have only returned around 2.1% per year. Though the United States is somewhat of an outlier in how well its equity markets have performed in this period, the picture in other countries isn’t much different: stocks have much higher returns than bonds. This fact is called the equity premium puzzle, and I’ll get into why exactly it’s so puzzling in this essay. I note here that while the discussion about the equity premium puzzle often focuses on broad stock indices, the puzzle is actually present in all asset markets: junk bonds, real estate, foreign exchange, et cetera.
If the difference between 8.4% and 2.1% per year looks small to you, remember that the logic of compounding works such that at these rates of return an initial investment into stocks (with cash from dividends, buybacks, etc. reinvested into the same portfolio) would double in value roughly every 8 years, while the same investment into bonds would double every 33 years. If the time horizon is on the order of a few decades, the difference between the value of the two portfolios becomes enormous. This, and many other interesting findings, are available in The Rate of Return on Everything.
The most common explanation offered for this difference in returns is that stocks are riskier and therefore it’s natural that they command a higher rate of return: people who buy stocks are compensated for the risk that they are taking on. To see why this explanation is by itself not sufficient to explain anything, it’s enough to notice that a short position on the stock market is at least just as risky (and, in fact, more risky) than a long position, and yet it has an abysmal and deeply negative mean return. Aside from being short the S&P 500, you can probably think of plenty of other risks you could take for which you would not be rewarded: if you jump from the top floor of a tall building and hope to make it safely to the ground, you take an enormous risk and nobody will reward you for doing so.
A better explanation is not that people are rewarded for taking on risk as such, but that they are rewarded for taking on risk that other people are willing to pay a premium to insure against. If you’re buying insurance against your house burning down, you might not care that much if the purchase is of negative expected value, because in exchange for that you get to unload the risk of your house burning down on a counterparty. If the terms of the contract are good enough, that might well be a good decision from your point of view, since your house burning down would be very bad for you.
However, this still leaves us with a mystery: if the insurance industry is competitive, then even if people on the demand side are willing to pay large premiums to insure their houses against fires, supply competition should cut down premiums to some small markup over expected cost. This is because while it’s risky for you as an individual to hold the risk of your house burning down, if a big insurance or reinsurance company holds a bundle of ten thousand similar contracts which aren’t correlated with each other, they successfully diversify their portfolio and bring down the risk they take by a lot while maintaining the same expected return.
The only way that premiums can get to be much higher than expected cost is therefore if something goes wrong in the argument from the preceding paragraph. The obvious candidate is the assumption that it is possible at all to construct a portfolio of ten thousand different insurance claims which are uncorrelated. For example, a big forest fire or a heat wave likely increases the chance of burning down of all ten thousand houses. How much risk we can diversify away depends on how much of the risk is idiosyncratic (specific to an individual) and how much of it is systemic (shared across all individuals). Only systemic risks can account for large differences between the premiums charged by insurance companies and the expected cost of the events that are being insured against.
If we apply this insight to the equity premium puzzle, we see that an equity premium makes sense if buying stocks means you take on some systemic risk which other people are willing to pay you a premium for you to take on. Again, the most obvious candidate here is the risk of economic recession. Stocks tend to do well when the economy is doing well and badly when the economy is doing badly, and we can think of “the state of the economy” as a systemic risk: there’s no way to diversify the risk that there will simply be less GDP to go around everyone. So we can at least explain why stocks have a higher return than bonds in qualitative terms, which is encouraging.
Unfortunately for our simple story, further problems begin to crop up when we look at not just the existence of the equity premium but its magnitude, especially compared to the amount of risk that is being taken. The standard deviation of real stock returns in the US over the same period 1870-2015 was around 20% per year. If we try to square this amount of risk with the magnitude of the gap between equity and Treasury bill returns, we end up having to postulate absurdly high levels of risk aversion and these postulates mean our models fail to fit other findings about asset returns, for example the relative stability of riskless short-term rates of return.
An important point here is that the equity premium is still not quite as well measured as we might like it to be. International stock returns over long horizons are highly correlated since long-run economic growth is shared across the world and plausibly dominates most of the variance in the forecasts, and as such the dozens of different stock markets we have access to actually don’t give that much extra evidence over just looking at the S&P 500 about whether there is a long-run equity premium or not. The naive standard error estimate is easy to compute: we have a difference of in annual returns of stocks versus bills, the standard deviation of stock returns is per year, and our time window is years long. If we put all that together we end up with a standard error of on the excess return of stocks over bonds, so if we take a two sigma confidence interval it’s entirely plausible that the equity premium is only half of the naive estimate . While the data provides strong evidence for the existence of the equity premium, the standard errors are large enough that our estimate of its magnitude can still plausibly range from per year to per year. Regardless of this fact, all “plausible” values of the premium are still much too large to be accounted for in quantitative terms.
Things get even worse because it turns out that not only is the average return on the stock market much higher than it “should be”, the expected return also varies over time much more than it should. We can forecast returns on the stock market with simple dividend yield regressions, especially if we are forecasting mean returns over a five to ten year horizon instead of annual returns, and these regressions tell us that the average return of 8.4% per year actually masks a lot of variation over the course of the business cycle. Stock returns are low in good economic times and high in bad economic times.
To illustrate this with a current example, currently the Metaculus community forecasts that the S&P 500 will realize an annual real return of only 5.4% per year from 2022 to 2031, considerably below the historical mean return of 8.4%.
Puzzles of excess volatility
The main obstacle to recovering the magnitude of the equity premium quantitatively, as well as the magnitude of its variation over the business cycle, is that our economy is not as risky as stocks make it look. Both consumption growth and GDP growth vary by much less than stock returns—the difference is approximately an order of magnitude. In fact if we expect that GDP growth continues on a particular trend line and recessions are just temporary falls in GDP, then since the stock price of a company depends not only on its current cash flows but its entire future stream of cash flows we should expect stock returns to vary less than GDP, but in fact they vary by much more. This is the so-called “puzzle of excess volatility”, and it’s apparent not only in stock markets but also in other markets, most notably in foreign exchange.
In 1988 Campbell and Shiller came up with a way of formalizing all this discussion which until then had been up in the air. If you’re not interested in the technical details you may skip ahead, but it’s an important milestone in the history of asset pricing, so I cover it here to explain how we know some of the things we know on the subject.
We start with the definition of stock returns over a period: your returns equal price appreciation plus the dividends you earn on the stock, where we fold other cash transfers such as stock buybacks into the dividends for simplicity. Symbolically, we can express that as
where is the gross return in period , is the start-of-period price in period , and denotes the dividends paid out for this stock in period . If we take natural logarithms of both sides and let variables in lowercase denote the logarithms of the variables in uppercase, we get
If dividend yields don’t vary by too much, we can denote the exponential of the “average value” of by ( would typically be for the S&P 500), and then we can approximate this further to get the Campbell-Shiller one period return identity
Typically since we’re only interested in variations in returns we discard the overall constants in this identity, so we can think of as holding up to a constant depending on .
Importantly, the only ingredient in this linearized identity is the definition of return. We’ve made no assumptions about how the stock market works beyond that.
One thing we can now do is to put on the left hand side and iterate this identity forward up to some time . If we do that, we get
This expression may seem scary, but what it expresses is actually very intuitive. If the price of a stock is high today relative to its current dividends, as a matter of definition there are only three things that can happen in the future:
Future dividend growth will be high.
Future returns on the stock will be low.
The price will be even higher compared to the dividends in the future.
Those three possibilities correspond to the three terms in the right hand side of that formula.
The reason we go through with this funny derivation is now we can actually understand what the puzzle of excess volatility is about. As a matter of accounting, any volatility in must come from volatility in one of the three terms in the right hand side. Since we in fact know that there’s excess volatility, one of these terms must be the culprit. We can figure out which term is responsible by noting that taking covariances of both sides with and dividing by the variance of gives an identity which certain regression coefficients must obey. We can then go and run these regressions to see which of the betas are contributing to the sum. What we find in the data is that are both approximately zero, while is approximately and dominates most of the sum. In other words, for broad stock market indices (not for individual stocks!), high price dividend ratios forecast neither strong future dividend growth nor even higher future price dividend ratios; they merely forecast weak future returns.
If we take expectations of the Campbell-Shiller present value identity at time , we see that the puzzle of excess volatility is the same puzzle as the puzzle of time-varying expected returns, which is the same puzzle as the time-varying equity premium! In some sense, there’s “only one puzzle” about all these aberrant behaviors of asset markets, and “equity premium puzzle” is as good of a name as any.
Remember that we already have a question about how much stock returns will be from 2022 to 2031. With the insight we get from the Campbell-Shiller present value formula, we might wonder what will contribute to the stock returns: will it be further increases in price-dividend ratios (meaning decreases in dividend yields), which have been trending upward for the past forty years; or will it be stronger growth in dividends? A simple way to operationalize a forecast of this is this question. I’m personally somewhat more pessimistic than the community about the future returns on the S&P 500, and by the logic of Campbell-Shiller that translates to a higher median dividend yield forecast of around .
If the Campbell-Shiller present value formula looks too complicated, one alternative way to understand the decline in dividend yields is to use the simple Gordon growth formula. This special case of Campbell-Shiller states that if a stock had a constant dividend growth of and a constant rate of return which went on forever, then its dividend yield would be equal to . In the past few decades we have seen an overall downward trend in various real rates of return that’s been more pronounced than the decline in the rate of economic growth, so we might think that this explains the secular downward trend in dividend yields. However, if we actually look at real S&P 500 returns over the past 40 years, they’ve been rather high: around 9% per year on average.
While we see no evidence of it yet, it’s theoretically also possible that the current low dividend yields correspond to high rather than low , in other words, to high anticipated future dividend growth. This would most likely have to come together with expectations of stronger future economic growth. Much like it was with the dotcom boom, the question is whether the currently low dividend yields will pay off in some form in the future, or whether we will simply have lower future expected returns going forward for the foreseeable future.
Explanations
While the finding that there are time-varying expected returns on asset markets and that they are responsible for “excess volatility” is ironclad, it’s not at all clear where this variation is coming from. Shiller advocated a “behavioralist” explanation which attributed the variation to investor irrationality, but this explanation seems quite weak since the time variation corresponds quite well to the peaks and troughs of the business cycle. Moreover, irrationality can’t account for the level of the equity premium unless we assume that people have been irrational in the same way and in roughly the same magnitude for a century and a half. With all these caveats, however, the primary scientific weakness of behavioralist explanations is that they don’t actually make any predictions beyond the patterns we observe in the data, so they are of little predictive value.
The alternative explanation is that time-varying expected returns correspond to either time-varying risks or variation in people’s willingness to bear risk over the course of the business cycle. 2009 was a great time to buy stocks, but you may be more scared of buying stocks when you’re more afraid to get a pay cut or to lose your job. These ideas can indeed be pushed to produce models which can quantitatively reproduce the time-varying equity premium along with its magnitude, but all of the models we get in this way have other undesirable properties which make them not very convincing as resolutions of the puzzle.
Perhaps the class of models which have been most popular recently are the “long-run risk” or “recursive utility” models of the equity premium. If you’ve done reinforcement learning or economics before, you might have run into dynamic value functions of the kind
where is a time discounting factor, is an instantaneous utility function with some usual properties and is how much the agent consumes in period . While this definition is convenient to work with, the central problem it has is that regardless of what we pick to be, we treat different times and different states of the world in an identical way. Both of them contribute as summands to : time is scaled by a factor and a particular state of the world is scaled by its probability of occuring, but beyond that the way they contribute to total value is identical. This is a problem because we know that people are actually much more reluctant to spread out consumption over states of the world than they are to spread it out over time. In concrete terms, people are much more willing to cut their consumption by half this year to consume twice as more next year, while they would be considerably more reluctant to take a gamble which cuts their consumption this year by half with a probability of and doubles it with a probability of .
In economic terms, this form of the value function cannot separate risk aversion from the elasticity of intertemporal substitution, and this is really the mystery of the equity premium puzzle: risk-free rates of return are low and don’t vary too much along with changes in consumption growth which implies a high elasticity of intertemporal substitution; while the equity premium is high, which implies high relative risk aversion. For of the type I wrote above, we have that these two quantities always multiply to as a matter of definition, so the fact that both of them must be high gives a contradiction.
The way to fix this is to do aggregation over time using a CES function instead of a naive sum, and this is the origin of the long-run risk model. The reason it’s called the “long-run risk model” is because it has a curious property: if the model holds, then stock prices respond to news about future consumption that aren’t reflected in today’s consumption. In fact, the only reason people are scared of recessions is because of what it signals about the long-term future of their consumption stream rather than the immediate and current effect on their consumption. Whether this is a feature or a bug of the model is up for debate, but we can certainly create some questions to test this prediction. One easy, though quite imperfect, approach is to test the tightness of the connection between bad current economic conditions and market crashes.
If we proxy for bad economic conditions using the unemployment rate, two questions attempting to do this for how far the market is down from its previous peak and the level of the VIX index are available on Metaculus. In my opinion the community forecast on the market crash question is too high and the forecast on the VIX question too low, since I think the VIX exceeding 50 is actually a more serious event than the market being 30% down from its last high.
Other popular explanations which have different properties from the long-run risk model when it comes to this kind of question include habit models and idiosyncratic risk models. Macro-Finance is a literature review on the subject which covers these models along with many other proposed explanations.
Conclusion
There are many different facets of the equity premium that I haven’t been able to get into in this essay, as it’s already running quite long. These include the surprising connection between foreign exchange volatility and the equity premium puzzle, cross-country correlations of stock returns, et cetera. Still, I hope I was able to give a good overview of the subject which poses the central questions related to the puzzle and goes over some of the directions that academic investigation of the subject has taken.
Is “The S&P 500” actually a good comparison entity? The companies that make it up vary all the time, on average remaining in it just a few decades, while US treasuries are from the same US government as always. Is there any company whose stock has average “S&P 500 level” returns over the course of a century or more? And presumably if there is, no simple method can consistently predict which one? In that case, can (some of?) the premium come from the need to constantly have money in motion and rebalancing and deciding what to include in indexes at all?
Similarly, if I instead compare the S&P 500 to a portfolio of bonds from all the large countries over the past century and a half, i.e. if I have to include the USSR and the Ottoman Empire and Imperial China in a constantly-updating index of bonds as my comparison point, what does that do to my expected government debt rate of return?
How much, if at all, is this a result of the choice to measure returns in USD, a currency whose supply is controlled by one of the entities involved in the comparison? The US government can actually, literally guarantee that as long as it exists at all, it will have enough dollars to pay its debts. Would it matter if I measured both financial assets in some other metric (although IDK which one, maybe something commodity based?).
Would the premium remain the same in a world where humans were immortal? People and organizations generally don’t just save and invest money, they do it for some purpose, with some time horizon in mind. If I expected to live 1,000 years it would matter a lot less to me if a crash caused me to delay buying a house by an extra decade, if it meant I’d most likely be able to get one a year sooner, or get a more expensive one, on average. The idea of putting a significant fraction of my savings in a low-yield US treasury would seem silly to immortal-me. Note: does this relate to why large endowments of long-lived trusts and organizations get higher returns?
Answering your questions in order:
What matters is that it’s something you can invest in. Choosing the S&P 500 is not really that important in particular. There doesn’t have to be a single company whose stock is perfectly correlated with the S&P 500 (though nowadays we have ETFs which more or less serve this purpose) - you can simply create your own value-weighted stock index and rebalance it on a daily or weekly basis to adjust for the changing weights over time, and nothing will change about the main arguments. This is actually what the authors of The Rate of Return on Everything do in the paper, since we don’t really have good value-weighted benchmark indices for stocks going back to 1870.
The general point (which I hint at but don’t make in the post) is that we persistently see high Sharpe ratios in asset markets. The article I cite at the start of the post also has data on real estate returns, for example, which exhibit an even stronger puzzle because they are comparable to stock returns in real terms but have half the volatility.
I don’t know the answer to your exact question, but a lot of governments have bonds which are quite risky and so this comparison wouldn’t be appropriate for them. If you think of the real yield of bonds as consisting of a time preference rate plus some risk premium (which is not a perfect model but not too far off), the rate of return on any one country’s bonds puts an upper bound on the risk-free rate of return. Therefore we don’t need to think about investing in countries whose bonds are risky assets in order to put a lower bound on the size of the equity premium relative to a risk-free benchmark.
This only has a negligible effect because the returns are inflation-adjusted and over long time horizons any real exchange rate deviation from the purchasing power parity benchmark is going to be small relative to the size of the returns we’re talking about. Phrased another way; inflation-adjusted stock prices are not stationary whereas real exchange rates are stationary, so as long as the time horizon is long enough you can ignore exchange rate effects so long as you perform inflation adjustment.
This is an interesting question and I don’t know the answer to it. Partly this is because we don’t really understand where the equity premium is coming from to begin with, so thinking about how some hypothetical change in the human condition would alter its size is not trivial. I think different models of the equity premium actually make different predictions about what would happen in such a situation.
It’s important, though, to keep in mind that the equity premium is not about the rate of time preference: risk-free rates of return are already quite low in our world of mortal people. It’s more about the volatility of marginal utility growth, and there’s no logical connection between that and the time for which people are alive. One of the most striking illustrations of that is Campbell and Cochrane’s habit formation model of the equity premium, which produces a long-run equity premium even at infinite time horizons, something a lot of other models of the equity premium struggle with.
I think in the real world if people became immortal the long-run (or average) equity premium would fall, but the short-run equity premium would still sometimes be high, in particular in times of economic difficulty.
Isn’t one possible solution to the equity puzzle just that US stocks have outperformed expectations recently? Returns on an index of European stocks are basically flat over the last 20 years.
Over 20 years that’s possible (and I think it’s in fact true), but the paper I cite in the post gives some data which makes it unlikely that the whole past record is outperformance. It’s hard to square 150 years of over 6% mean annual equity premium with 20% annual standard deviation with the idea that the true stock return is actually the same as the return on T-bills. The “true” premium might be lower than 6% but not by too much, and we’re still left with more or less the same puzzle even if we assume that.