In an earlier post, I looked at some general domains of forecasting. This post looks at some more specific classes of forecasting, some of which overlap with the general domains, and some of which are more isolated. The common thread to these classes of forecasting is that they involve rare events.
Different types of forecasting for rare events
When it comes to rare events, there are three different classes of forecasts:
Point-in-time-independent probabilistic forecasts: Forecasts that provide a probability estimate for the event occurring in a given timeframe, but with no distinction based on the point in time. In other words, the forecast may say “there is a 5% chance of an earthquake higher than 7 on the Richter scale in this geographical region in a year” but the forecast is not sensitive to the choice of year. These are sufficient to inform decisions on general preparedness. In the case of earthquakes, for instance, the amount of care to be taken in building structures can be determined based on these forecasts. On the other hand, it’s useless for deciding the timing of specific activities.
Point-in-time-dependent probabilistic forecasts: Forecasts that provide a probability estimate that varies somewhat over time based on history, but aren’t precise enough for a remedial measure that substantially offsets major losses. For instance, if I know that an earthquake will occur in San Francisco in the next 6 months with probability 90%, it’s still not actionable enough for a mass evacuation of San Francisco. But some preparatory measures may be undertaken.
Predictions made with high confidence (i.e., a high estimated probability when the event is predicted) and a specific time, location, and characteristics: Precise predictions of date and time, sufficient for remedial measures that substantially offset major losses (but possibly at huge, if much smaller, cost). The situation with hurricanes, tornadoes, and blizzards is roughly in this category.
Statistical distributions: normal distributions versus power law distributions
Perhaps the most ubiquitous distribution used in probability and statistics is the normal distribution. The normal distribution is a symmetric distribution whose probability density function decays superexponentially with distance from the mean (more precisely, it is exponential decay in the square of the distance). In other words, the probability decays slowly at the beginning, and faster later. Thus, for instance, the ratio of pdfs for 2 standard deviations from the mean and 1 standard deviation from the mean is greater than the ratio of pdfs for 3 standard deviations from the mean and 2 standard deviations from the mean. To give explicit numbers: about 68.2% of the distribution lies between −1 and +1 SD, 95.4% lies between −2 and +2 SD, 99.7% lies between −3 and +3 SD, and 99.99% lies between −4 and +4 SD. So the probability of being more than 4 standard deviations is less than 1 in 10000.
If the probability distribution for intensity looks (roughly) like a normal distribution, then high-intensity events are extremely unlikely. So, if the probability distribution for intensity is normal, we do not have to worry about high-intensity events much.
The types of situations where rare event forecasting becomes more important is where events that are high-intensity, or “extreme” in some sense, occur rarely but not as rarely as in a normal distribution. We say that the tails of such distributions are thicker than those of the normal distribution, and the distributions are termed “thick-tailed” or “fat-tailed” distributions. [Formally, the thickness of tails is measured using a quantity called excess kurtosis, which sees how the fourth central moment compares with the square of the second central moment (the second central moment is the variance, and it is the square of the standard deviation), then subtracts off the number 3, which is the corresponding value for the normal distribution. If the excess kurtosis for a distribution is positive, it is a thick-tailed distribution.]
The most common example of such distributions that is of interest to us is power law distributions. Here, the probability is proportional to a negative power. So the decay is like a power. If you remember some basic precalculus/calculus, you’ll recall that power functions (such as the square function or cube function) grow more slowly than exponential functions. So power law distributions decay more subexponentially: they decay more slowly than exponential decay (to be more precise, the decay starts off as fast, then slows down). As noted above, the pdf for the normal distribution decays exponentially in the square of the distance from the mean, so the upshot is that power law distributions decay more slowly than normal distributions.
For most of the rare event classes we discuss, to the extent that it has been possible to pin down a distribution, it has looked a lot more like a power law distribution than a normal distribution. Thus, rare events need to be heeded. (There’s obviously a selection effect here: for those cases where the distributions are close to normal, forecasting rare events just isn’t that challenging, so they wouldn’t be included in my post).
UPDATE: Aaron Clauset, who appears in #4, pointed me (via email) to his Rare Events page, containing the code (Matlab and Python) that he used in his terrorism statistics paper mentioned as an update at the bottom of #4. He noted in the email that the statistical methods are fairly general, so interested people could use the code if they were interested in cross-applying to rare events in other domains.
Talebisms
One of the more famous advocates of the idea that people overestimate the ubiquity of normal distributions and underestimate the prevalence of power law distributions is Nassim Nicholas Taleb. Taleb calls the world of normal distributions Mediocristan (the world of mediocrity, where things are mostly ordinary and weird things are very very rare) and the world of power law distributions Extremistan (the world of extremes, where rare and weird events are more common). Taleb has elaborated on this thesis in his book The Black Swan, though some parts of the idea are also found in his earlier book Fooled by Randomness.
I’m aware that a lot of people swear by Taleb, but I personally don’t find his writing very impressive. He does cover a lot of important ideas but they didn’t originate with him, and he goes off on a lot of tangents. In contrast, I found Nate Silver’s The Signal and the Noise a pretty good read, and although it wasn’t focused on rare events per se, the parts of it that did discuss such forecasting were used by me in this post.
(Sidenote: My criticism of Taleb is broadly similar to that offered by Jamie Whyte here in Standpoint Magazine. Also, here’s a review by Steve Sailer of Taleb. Sailer is much more favorably inclined to the normal distribution than Taleb is, and this is probably related to his desire to promote IQ distributions/The Bell Curve type ideas, but I think many of Sailer’s criticisms are spot on).
Examples of rare event classes that we discuss in this post
The classes discussed in this post include:
Earthquakes: Category #1, also, hypothesized to follows a power law distribution.
Major terrorist acts: Questionable, at least Category #1, some argue it is Category #2 or Category #3. Hypothesized to follow a power law distribution.
Power outages (could be caused by any of 1-4, typically 3)
Server outages (could be caused by 5)
Financial crises
Global pandemics, such as the 1918 flu pandemic (popularly called the “Spanish flu”) that, according to Wikipedia, “infected 500 million people across the world, including remote Pacific islands and the Arctic, and killed 50 to 100 million of them—three to five percent of the world’s population.” They probably fall under Category #2, but I couldn’t get a clear picture. (Pandemics were not in the list at the time of original publication of the post; I added them based on a comment suggestion).
Near-earth object impacts (not in the list at the time of original publication of the post; I added them based on a comment suggestion).
Other examples of rare events would also be appreciated.
#1: Earthquakes
Earthquake prediction remains mostly in category 1: there are probability estimates of the occurrence of earthquakes of a given severity or higher within a given timeframe, but these estimates do not distinguish between different points in time. In The Signal and the Noise, statistician and forecasting expert Nate Silver talks to Susan Hough (Wikipedia) of the United States Geological Survey and describes what she has to say about the current state of earthquake forecasting:
What seismologists are really interested in— what Susan Hough calls the “Holy Grail” of seismology— are time-dependent forecasts, those in which the probability of an earthquake is not assumed to be constant across time.
Silver, Nate (2012-09-27). The Signal and the Noise: Why So Many Predictions Fail-but Some Don’t (p. 154). Penguin Group US. Kindle Edition.
In fact, even for the time-independent earthquake forecasting, currently the best known forecasting method is the extremely simple Gutenberg-Richter law, which says that for a given location, the frequency of earthquakes obeys a power law with respect to intensity. Since the Richter scale is logarithmic (to base 10), this means that adding a point on the Richter scale makes the frequency of earthquakes decrease to a fraction of the previous value. Note that the Gutenberg-Richter law can’t be the full story: there are probably absolute limits on the intensity of the earthquake (some people believe that an earthquake of intensity 10 or higher is impossible). But so far, it seems to have the best track record.
Why haven’t we been able to come up with better models? This relates to the problem of overfitting common in machine learning and statistics: when the number of data points is very small, and quite noisy, then trying a more complicated law (with more freely varying parameters) ends up fitting the noise in the data rather than the signal, and therefore ends up being a poor fit for new, out-of-sample data. The problem is dealt with in statistics using various goodness of fit tests and measures such as the Akaike information criterion, and it’s dealt with in machine learning using a range of techniques such as cross-validation, regularization, and early stopping. These approaches can generally work well in situations where there is lots of data and lots of parameters. But in cases where there is very little data, it often makes sense to just manually select a simple model. The Gutenberg-Richter law has two parameters, and can be fit using a simple linear regression. There isn’t enough information to reliably fit even modestly more complicated models, such as the characteristic earthquake models, and past attempts based on characteristic earthquakes failed in both directions (a predicted earthquake at Parkfield never materialized, and the probability of the 2011 Japan earthquake was underestimated by the model relative to the Gutenberg-Richter law).
Silver’s chapter and other sources do describe some possibilities for short-term forecasting based on foreshocks and aftershocks, and seismic disturbances, but note considerable uncertainty.
The existence of time-independent forecasts for earthquakes has probably had major humanitarian benefits. Building codes and standards, in particular, can adapt to the probability of earthquakes. For instance, building standards are greater in the San Francisco Bay Area than in other parts of the United States, partly because of the greater probability of earthquakes. Note also that Gutenberg-Richter does make out-of-sample predictions: it can use the frequency of low-intensity earthquakes to predict the frequency of high-intensity earthquakes, and therefore obtain a time-independent forecast of such an earthquake in a region that may never have experienced it.
#2: Volcanic eruptions
Volcanoes are an easier case than earthquakes. Silver’s book doesn’t discuss them, but the Wikipedia article offers basic information. A few points:
Volcanic activity falls close to category #2: time-dependent forecasts can be made, albeit with considerable uncertainty.
Volcanic activity poses less immediate risk because fewer people live close to the regions where volcanoes typically erupt.
However, volcanic activity can affect regional and global climate for a few years (in the cooling direction), and might even shift the intercept of other long-term secular and cyclic trends in climate (the reason is that the dust particles released by volcanoes into the atmosphere reduce the extent to which solar radiation is absorbed). For instance, the 1991 Mount Pinatubo eruption is credited with causing the next 1-2 years to be cooler than they otherwise would be, masking the heating effect of a strong El Nino.
Forecasting for lightning and thunderstorms has improved quite a bit over the last century, and falls squarely within Category #3. In The Signal and the Noise, Nate Silver notes that the probability of an American dying from lightning has dropped from 1 in 400,000 in 1940 to 1 in 11,000,000 today, and a large part of the credit goes to better weather forecasting causing people to avoid the outdoors at the times and places that lightning might strike.
Forecasting for hurricanes and cyclones (which are the same weather phenomenon, just at different latitudes) is quite good, and getting better. It falls squarely in category #3: in addition to having general probability estimates of the likelihood of particular types of extreme weather events, we can forecast them a day or a few days in advance, allowing for preparation and minimization of negative impact.
The precision for forecasting the eye of the storm has increased about 3.5-fold in length terms (so about 12-fold in area terms) over the last 25 years. Nate Silver notes that 25 years ago, the National Hurricane Center’s forecasts for where a hurricane would hit on landfall, made three days in advance, were 350 miles off on average. Now they’re about 100 miles off on average. Most of the major hurricanes that hit the United States, and many other parts of the world, were forecast well in advance, and people even made preparations (for instance, by declaring holidays, or stocking up on goods). Blizzard forecasting is also fairly impressive: I was at Chicago in 2011 when a blizzard hit, and it had been forecast at least a day in advance. With tornadoes, tornado warning alerts are often issued, albeit the tornado often doesn’t actually touch down even after the alert is issued (fortunately for us).
Terrorist attacks are interesting. It has been claimed that the frequency-damage relationship for terrorist attacks follows a power law. The academic paper that popularized this observation is a paper by Aaron Clauset, Maxwell Young and Kristian Gleditsch titled “On the Frequency of Severe Terrorist Attacks” (Journal of Conflict Resolution 51(1), 58 − 88 (2007)), here. Bruce Schneier wrote a blog post about a later paper by Clauset and Frederick W. Wiegel, and see also more discussion here, here, here, and here (I didn’t select these links through a very discerning process; I just picked the top results of a Google Search).
Silver’s book does allude to power laws for terrorism, but I couldn’t find any reference to Clauset in his book (oops, seems like my Kindle search was buggy!) and says the following about Clauset:
Clauset’s insight, however, is actually quite simple— or at least it seems that way with the benefit of hindsight. What his work found is that the mathematics of terrorism resemble those of another domain discussed in this book: earthquakes.
Imagine that you live in a seismically active area like California. Over a period of a couple of decades, you experience magnitude 4 earthquakes on a regular basis, magnitude 5 earthquakes perhaps a few times a year, and a handful of magnitude 6s. If you have a house that can withstand a magnitude 6 earthquake but not a magnitude 7, would it be right to conclude that you have nothing to worry about?
Of course not. According to the power-law distribution that these earthquakes obey, those magnitude 5s and magnitude 6s would have been a sign that larger earthquakes were possible—inevitable, in fact, given enough time. The big one is coming, eventually. You ought to have been prepared.
Terror attacks behave in something of the same way. The Lockerbie bombing and Oklahoma City were the equivalent of magnitude 7 earthquakes. While destructive enough on their own, they also implied the potential for something much worse— something like the September 11 attacks, which might be thought of as a magnitude 8. It was not an outlier but instead part of the broader mathematical pattern.
Silver, Nate (2012-09-27). The Signal and the Noise: Why So Many Predictions Fail-but Some Don’t (pp. 427-428). Penguin Group US. Kindle Edition.
So terrorist attacks are at least in category 1. What about categories 2 and 3? Can we forecast terrorist attacks the way we can forecast volcanoes, or the way we can forecast hurricanes. One difference between terrorist acts and the “acts of God” discussed so far is that to the extent one has inside information about a terrorist attack that’s good enough to predict it with high accuracy, it’s usually also sufficient to actually prevent the terrorist attack. So Category 3 becomes trickier to define. Should we count the numerous foiled terrorist plots as evidence that terrorist acts can be successfully “predicted” or should we only consider successful terrorist acts in the denominator? And another complication is that terrorist acts are responsive to geopolitical decisions in ways that earthquakes are definitely not, with extreme weather events falling somewhere in between.
As for Category 2, the evidence is unclear, but it’s highly likely that terrorist acts can be forecast in a time-dependent fashion to quite a degree. If you want to crunch the numbers yourself, the Global Terrorism Database (website, Wikipedia) and Suicide Attack Database (website, Wikipedia) are available for you to use. I discussed some general issues with political and conflict forecasting in my earlier post on the subject.
UPDATE: Clauset emailed me with some corrections to this section of the post, which I have made. He also pointed to a recent paper he co-wrote with Ryan Woodward about estimating the historical and future probabilities of terror events, available on the ArXiV. Here’s the abstract:
Quantities with right-skewed distributions are ubiquitous in complex social systems, including political conflict, economics and social networks, and these systems sometimes produce extremely large events. For instance, the 9/11 terrorist events produced nearly 3000 fatalities, nearly six times more than the next largest event. But, was this enormous loss of life statistically unlikely given modern terrorism’s historical record? Accurately estimating the probability of such an event is complicated by the large fluctuations in the empirical distribution’s upper tail. We present a generic statistical algorithm for making such estimates, which combines semi-parametric models of tail behavior and a nonparametric bootstrap. Applied to a global database of terrorist events, we estimate the worldwide historical probability of observing at least one 9/11-sized or larger event since 1968 to be 11-35%. These results are robust to conditioning on global variations in economic development, domestic versus international events, the type of weapon used and a truncated history that stops at 1998. We then use this procedure to make a data-driven statistical forecast of at least one similar event over the next decade.
#5: Power outages
Power outages could have many causes. Note that insofar as we can forecast the phenomena underlying the causes, this can be used to reduce, rather than simply forecast, power outages.
Poor load forecasting, i.e., electricity companies don’t forecast how much demand there will be and don’t prepare supplies adequately. This is less of an issue in developed countries, where the power systems are more redundant (at some cost to efficiency): Note here that the power outage occurs due to a failure of a more mundane forecasting exercise. Forecasting the frequency of power outages due to this cause is basically an exercise in calibrating the quality of the mundane forecasting exercise.
Abrupt or significant shortages in fuel, often for geopolitical reasons. This therefore ties in with the general exercise of geopolitical forecasting (see my earlier post on the subject). This seems rare in the modern world, due to the considerably redundancy built into global fuel supplies.
Disruption of power lines or power supply units due to weather events. The most common causes appear to be lightning, ice, wind, rain, and flooding. This ties in with #3, and with my weather forecasting and climate forecasting posts. This is the most common cause of power outages in developed countries with advanced electricity grids (see, for instance, here and here).
Disruption by human or animal activity, including car accidents and animals climbing onto and playing with the power lines.
Perhaps the most niche source of power outages, that many people may be unaware of, is geomagnetic storms (Wikipedia). These are quite rare, but can result in major power blackouts. Geomagnetic storms were discussed in past MIRI posts (here and here). Geomagnetic storms are low-frequency and low-probability events but with potentially severe negative impact.
My impression is that when it comes to power outages, we are at Category 2 in forecasting. Load forecasting can identify seasons, times of the day, and special occasions when power demand will be high. Note that the infrastructure needs to built for peak capacity.
We can’t quite be in Category 3, because in cases where we can forecast more finely, we could probably prevent the outage anyway.
What sort of preventive measures do people undertake with knowledge of the frequency of power outages? In places where power outages are more likely, people are more likely to have backup generators. People may be more likely to use battery-powered devices. If you know that a power outage is likely to happen in the next few days, you might take more care to charge the batteries on your devices.
#6: Server outages
In our increasingly connected world, websites going down can have a huge effect on the functioning of the Internet and of the world economy. As with power infrastructure, the complexity of server infrastructure needed to increase uptime increases very quickly. The point is that routing around failures at different points in the infrastructure requires redundancy. For instance, if any one server fails 10% of the time, and the failures of different components are independent, you’d need two servers to get to a 1% failure rate. But in practice, the failures aren’t independent. For instance, having loads of servers in a single datacenter covers the risk of any given server there crashing, but it doesn’t cover the risk of the datacenter itself getting disconnected (e.g., losing electricity, or getting disconnected from the Internet, or catching fire). So now we need multiple datacenters. But multiple datacenters are far from each other, so that increases the time costs of synchronization. And so on. For more detailed discussions of the issues, see here and here.
My impression is that server outages are largely Category 1: we can use the probability of outages to determine the trade-off between the cost of having redundant infrastructure and the benefit of more uptime. There is an element of Category 2: in some cases, we have knowledge that traffic will be higher at specific times, and additional infrastructure can be brought to bear for those times. As with power infrastructure, server infrastructure needs to be built to handle peak capacity.
#7: Financial crises
The forecasting of financial crises is a topic worthy of its own post. As with climate science, financial crisis forecasting has the potential for heavy politicization, given the huge stakes both of forecasting financial crises and of any remedial or preventative measures that may be undertaken. In fact, the politicization and ideology problem is probably substantially worse in financial crisis forecasting. At the same time, real-world feedback occurs faster, providing more opportunity for people to update their beliefs and less scope for people getting away with sloppiness because their predictions take too long to evaluate.
A literally taken strong efficient market hypothesis (EMH) (Wikipedia) would suggest that financial crises are almost impossible to forecast, while a weaker reading of the EMH would suggest that the financial market is efficient (Wikipedia): it’s hard to make money off the business of forecasting financial crises (for instance, you may know that a financial crisis is imminent with high probability, but the element of uncertainty, particularly with regards to timing, can destroy your ability to leverage that information to make money). On the other hand, there are a lot of people, often subscribed to competing schools of economic thought, who successfully forecast the 2007-08 financial crisis, at least in broad strokes.
Note that there are people who reject the EMH, yet claim that financial crises are very hard to forecast in a time-dependent fashion. Among them is Nassim Nicholas Taleb, as described here. Interestingly, Taleb’s claim to fame appears to have been that he was able to forecast the 2007-08 financial crisis, albeit it was more of a time-independent forecast than a specific timed call. The irony was noted by by Jamie Whyte here in Standpoint Magazine.
I found a few sources of information for financial crises, that are discussed below.
Economic Predictions records predictions made by many prominent people and how they compared to what transpired. In particular, this page on their website notes how many of the top investors, economists, and bureaucrats missed the financial crisis, but also identifies some exceptions: Dean Baker, Med Jones, Peter Schiff, and Nouriel Roubini. The page also discusses other candidates who claim to have forecasted the crisis in advance, and reasons why they were not included. While I think they’ve put in a fair deal of effort into their project, I didn’t see good evidence that they have a strong grasp of the underlying fundamental issues they are discussing.
An insightful general overview of the financial crisis is found in Chapter 1 of Nate Silver’s The Signal and the Noise, a book that I recommend you read in its entirety. Silver notes four levels of forecasting failure.
The housing bubble can be thought of as a poor prediction. Homeowners and investors thought that rising prices implied that home values would continue to rise, when in fact history suggested this made them prone to decline.
There was a failure on the part of the ratings agencies, as well as by banks like Lehman Brothers, to understand how risky mortgage-backed securities were. Contrary to the assertions they made before Congress, the problem was not that the ratings agencies failed to see the housing bubble. Instead, their forecasting models were full of faulty assumptions and false confidence about the risk that a collapse in housing prices might present.
There was a widespread failure to anticipate how a housing crisis could trigger a global financial crisis. It had resulted from the high degree of leverage in the market, with $ 50 in side bets staked on every $ 1 that an American was willing to invest in a new home.
Finally, in the immediate aftermath of the financial crisis, there was a failure to predict the scope of the economic problems that it might create. Economists and policy makers did not heed Reinhart and Rogoff’s finding that financial crises typically produce very deep and long-lasting recessions.
Silver, Nate (2012-09-27). The Signal and the Noise: Why So Many Predictions Fail-but Some Don’t (pp. 42-43). Penguin Group US. Kindle Edition.
Silver finds a common thread among all the failures (emphases in original):
There is a common thread among these failures of prediction. In each case, as people evaluated the data, they ignored a key piece of context:
The confidence that homeowners had about housing prices may have stemmed from the fact that there had not been a substantial decline in U.S. housing prices in the recent past. However, there had never before been such a widespread increase in U.S. housing prices like the one that preceded the collapse.
The confidence that the banks had in Moody’s and S& P’s ability to rate mortgage-backed securities may have been based on the fact that the agencies had generally performed competently in rating other types of financial assets. However, the ratings agencies had never before rated securities as novel and complex as credit default options.
The confidence that economists had in the ability of the financial system to withstand a housing crisis may have arisen because housing price fluctuations had generally not had large effects on the financial system in the past. However, the financial system had probably never been so highly leveraged, and it had certainly never made so many side bets on housing before.
The confidence that policy makers had in the ability of the economy to recuperate quickly from the financial crisis may have come from their experience of recent recessions, most of which had been associated with rapid, “V-shaped” recoveries. However, those recessions had not been associated with financial crises, and financial crises are different.
There is a technical term for this type of problem: the events these forecasters were considering were out of sample. When there is a major failure of prediction, this problem usually has its fingerprints all over the crime scene.
Silver, Nate (2012-09-27). The Signal and the Noise: Why So Many Predictions Fail-but Some Don’t (p. 43). Penguin Group US. Kindle Edition.
While I find Silver’s analysis plausible and generally convincing, I don’t think I have enough of an inside-view understanding of the issue.
A few other resources that I found, but didn’t get a chance to investigate, are listed below:
I haven’t investigated this thoroughly, but here are a few of my impressions and findings:
I think that pandemics stand in relation to ordinary epidemiology in the same way that extreme weather events stand in relation to ordinary weather forecasting. In both cases, the main way we can get better at forecasting the rare and high-impact events is by getting better across the board. There is a difference that makes the relation between moderate disease outbreaks and pandemics even more important than the corresponding case for weather: measures taken quickly to react to local disease outbreaks can help prevent global pandemics.
Chapter 7 of Nate Silver’s The Signal and the Noise, titled “Role Models”, discusses forecasting and prediction in the domain of epidemiology. The goal of epidemiologists is to obtain predictive models that have a level of accuracy and precision similar to those used for the weather. However, the greater complexity of human behavior, as well as the self-fulfilling and self-canceling nature of various predictions, makes the modeling problem harder. Silver notes that agent-based modeling (Wikipedia) is one of the commonly used tools. Silver cites a few examples from recent history where people were overly alarmed about possible pandemics, when the reality turned out to be considerably milder. However, the precautions taken due to the alarm may still have saved lives. Silver talks in particular of the 1976 swine flu outbreak (where the reaction turned out be grossly disproportional to the problem, and caused its own unintended consequences) and the 2009 flu pandemic.
In recent years, Google Flu Trends (website, Wikipedia) has been a common technique in identifying and taking quick action against the flu. Essentially, Google uses the volume of web search for flu-related terms by geographic location to identify the incidence of the flu by geographic location. It offers an early “leading indicator” of flu incidence compared to official reports, that are published after a time lag. However, Google Flu Trends has run into problems of reliability: news stories about the flu might prompt people to search for flu-related terms, even if they aren’t experiencing symptoms of the flu. Or it may even be the case that Google’s own helpful search query completions get people to search for flu-related terms once other people start searching for the term. Tim Harford discusses the problems in the Financial Timeshere. I think Silver doesn’t discuss this (which is a surprise, since it would have fit well with the theme of his chapter).
#9: Near-earth object impacts
I haven’t looked into this category in sufficient detail. I’ll list below the articles I read.
Forecasting rare events
In an earlier post, I looked at some general domains of forecasting. This post looks at some more specific classes of forecasting, some of which overlap with the general domains, and some of which are more isolated. The common thread to these classes of forecasting is that they involve rare events.
Different types of forecasting for rare events
When it comes to rare events, there are three different classes of forecasts:
Point-in-time-independent probabilistic forecasts: Forecasts that provide a probability estimate for the event occurring in a given timeframe, but with no distinction based on the point in time. In other words, the forecast may say “there is a 5% chance of an earthquake higher than 7 on the Richter scale in this geographical region in a year” but the forecast is not sensitive to the choice of year. These are sufficient to inform decisions on general preparedness. In the case of earthquakes, for instance, the amount of care to be taken in building structures can be determined based on these forecasts. On the other hand, it’s useless for deciding the timing of specific activities.
Point-in-time-dependent probabilistic forecasts: Forecasts that provide a probability estimate that varies somewhat over time based on history, but aren’t precise enough for a remedial measure that substantially offsets major losses. For instance, if I know that an earthquake will occur in San Francisco in the next 6 months with probability 90%, it’s still not actionable enough for a mass evacuation of San Francisco. But some preparatory measures may be undertaken.
Predictions made with high confidence (i.e., a high estimated probability when the event is predicted) and a specific time, location, and characteristics: Precise predictions of date and time, sufficient for remedial measures that substantially offset major losses (but possibly at huge, if much smaller, cost). The situation with hurricanes, tornadoes, and blizzards is roughly in this category.
Statistical distributions: normal distributions versus power law distributions
Perhaps the most ubiquitous distribution used in probability and statistics is the normal distribution. The normal distribution is a symmetric distribution whose probability density function decays superexponentially with distance from the mean (more precisely, it is exponential decay in the square of the distance). In other words, the probability decays slowly at the beginning, and faster later. Thus, for instance, the ratio of pdfs for 2 standard deviations from the mean and 1 standard deviation from the mean is greater than the ratio of pdfs for 3 standard deviations from the mean and 2 standard deviations from the mean. To give explicit numbers: about 68.2% of the distribution lies between −1 and +1 SD, 95.4% lies between −2 and +2 SD, 99.7% lies between −3 and +3 SD, and 99.99% lies between −4 and +4 SD. So the probability of being more than 4 standard deviations is less than 1 in 10000.
If the probability distribution for intensity looks (roughly) like a normal distribution, then high-intensity events are extremely unlikely. So, if the probability distribution for intensity is normal, we do not have to worry about high-intensity events much.
The types of situations where rare event forecasting becomes more important is where events that are high-intensity, or “extreme” in some sense, occur rarely but not as rarely as in a normal distribution. We say that the tails of such distributions are thicker than those of the normal distribution, and the distributions are termed “thick-tailed” or “fat-tailed” distributions. [Formally, the thickness of tails is measured using a quantity called excess kurtosis, which sees how the fourth central moment compares with the square of the second central moment (the second central moment is the variance, and it is the square of the standard deviation), then subtracts off the number 3, which is the corresponding value for the normal distribution. If the excess kurtosis for a distribution is positive, it is a thick-tailed distribution.]
The most common example of such distributions that is of interest to us is power law distributions. Here, the probability is proportional to a negative power. So the decay is like a power. If you remember some basic precalculus/calculus, you’ll recall that power functions (such as the square function or cube function) grow more slowly than exponential functions. So power law distributions decay more subexponentially: they decay more slowly than exponential decay (to be more precise, the decay starts off as fast, then slows down). As noted above, the pdf for the normal distribution decays exponentially in the square of the distance from the mean, so the upshot is that power law distributions decay more slowly than normal distributions.
For most of the rare event classes we discuss, to the extent that it has been possible to pin down a distribution, it has looked a lot more like a power law distribution than a normal distribution. Thus, rare events need to be heeded. (There’s obviously a selection effect here: for those cases where the distributions are close to normal, forecasting rare events just isn’t that challenging, so they wouldn’t be included in my post).
UPDATE: Aaron Clauset, who appears in #4, pointed me (via email) to his Rare Events page, containing the code (Matlab and Python) that he used in his terrorism statistics paper mentioned as an update at the bottom of #4. He noted in the email that the statistical methods are fairly general, so interested people could use the code if they were interested in cross-applying to rare events in other domains.
Talebisms
One of the more famous advocates of the idea that people overestimate the ubiquity of normal distributions and underestimate the prevalence of power law distributions is Nassim Nicholas Taleb. Taleb calls the world of normal distributions Mediocristan (the world of mediocrity, where things are mostly ordinary and weird things are very very rare) and the world of power law distributions Extremistan (the world of extremes, where rare and weird events are more common). Taleb has elaborated on this thesis in his book The Black Swan, though some parts of the idea are also found in his earlier book Fooled by Randomness.
I’m aware that a lot of people swear by Taleb, but I personally don’t find his writing very impressive. He does cover a lot of important ideas but they didn’t originate with him, and he goes off on a lot of tangents. In contrast, I found Nate Silver’s The Signal and the Noise a pretty good read, and although it wasn’t focused on rare events per se, the parts of it that did discuss such forecasting were used by me in this post.
(Sidenote: My criticism of Taleb is broadly similar to that offered by Jamie Whyte here in Standpoint Magazine. Also, here’s a review by Steve Sailer of Taleb. Sailer is much more favorably inclined to the normal distribution than Taleb is, and this is probably related to his desire to promote IQ distributions/The Bell Curve type ideas, but I think many of Sailer’s criticisms are spot on).
Examples of rare event classes that we discuss in this post
The classes discussed in this post include:
Earthquakes: Category #1, also, hypothesized to follows a power law distribution.
Volcanoes: Category #2.
Extreme weather events (hurricanes/cyclones, tornadoes, blizzards): Category #3.
Major terrorist acts: Questionable, at least Category #1, some argue it is Category #2 or Category #3. Hypothesized to follow a power law distribution.
Power outages (could be caused by any of 1-4, typically 3)
Server outages (could be caused by 5)
Financial crises
Global pandemics, such as the 1918 flu pandemic (popularly called the “Spanish flu”) that, according to Wikipedia, “infected 500 million people across the world, including remote Pacific islands and the Arctic, and killed 50 to 100 million of them—three to five percent of the world’s population.” They probably fall under Category #2, but I couldn’t get a clear picture. (Pandemics were not in the list at the time of original publication of the post; I added them based on a comment suggestion).
Near-earth object impacts (not in the list at the time of original publication of the post; I added them based on a comment suggestion).
Other examples of rare events would also be appreciated.
#1: Earthquakes
Earthquake prediction remains mostly in category 1: there are probability estimates of the occurrence of earthquakes of a given severity or higher within a given timeframe, but these estimates do not distinguish between different points in time. In The Signal and the Noise, statistician and forecasting expert Nate Silver talks to Susan Hough (Wikipedia) of the United States Geological Survey and describes what she has to say about the current state of earthquake forecasting:
The whole Silver chapter is worth reading, as is the Wikipedia page on earthquake prediction, which covers much of the same ground.
In fact, even for the time-independent earthquake forecasting, currently the best known forecasting method is the extremely simple Gutenberg-Richter law, which says that for a given location, the frequency of earthquakes obeys a power law with respect to intensity. Since the Richter scale is logarithmic (to base 10), this means that adding a point on the Richter scale makes the frequency of earthquakes decrease to a fraction of the previous value. Note that the Gutenberg-Richter law can’t be the full story: there are probably absolute limits on the intensity of the earthquake (some people believe that an earthquake of intensity 10 or higher is impossible). But so far, it seems to have the best track record.
Why haven’t we been able to come up with better models? This relates to the problem of overfitting common in machine learning and statistics: when the number of data points is very small, and quite noisy, then trying a more complicated law (with more freely varying parameters) ends up fitting the noise in the data rather than the signal, and therefore ends up being a poor fit for new, out-of-sample data. The problem is dealt with in statistics using various goodness of fit tests and measures such as the Akaike information criterion, and it’s dealt with in machine learning using a range of techniques such as cross-validation, regularization, and early stopping. These approaches can generally work well in situations where there is lots of data and lots of parameters. But in cases where there is very little data, it often makes sense to just manually select a simple model. The Gutenberg-Richter law has two parameters, and can be fit using a simple linear regression. There isn’t enough information to reliably fit even modestly more complicated models, such as the characteristic earthquake models, and past attempts based on characteristic earthquakes failed in both directions (a predicted earthquake at Parkfield never materialized, and the probability of the 2011 Japan earthquake was underestimated by the model relative to the Gutenberg-Richter law).
Silver’s chapter and other sources do describe some possibilities for short-term forecasting based on foreshocks and aftershocks, and seismic disturbances, but note considerable uncertainty.
The existence of time-independent forecasts for earthquakes has probably had major humanitarian benefits. Building codes and standards, in particular, can adapt to the probability of earthquakes. For instance, building standards are greater in the San Francisco Bay Area than in other parts of the United States, partly because of the greater probability of earthquakes. Note also that Gutenberg-Richter does make out-of-sample predictions: it can use the frequency of low-intensity earthquakes to predict the frequency of high-intensity earthquakes, and therefore obtain a time-independent forecast of such an earthquake in a region that may never have experienced it.
#2: Volcanic eruptions
Volcanoes are an easier case than earthquakes. Silver’s book doesn’t discuss them, but the Wikipedia article offers basic information. A few points:
Volcanic activity falls close to category #2: time-dependent forecasts can be made, albeit with considerable uncertainty.
Volcanic activity poses less immediate risk because fewer people live close to the regions where volcanoes typically erupt.
However, volcanic activity can affect regional and global climate for a few years (in the cooling direction), and might even shift the intercept of other long-term secular and cyclic trends in climate (the reason is that the dust particles released by volcanoes into the atmosphere reduce the extent to which solar radiation is absorbed). For instance, the 1991 Mount Pinatubo eruption is credited with causing the next 1-2 years to be cooler than they otherwise would be, masking the heating effect of a strong El Nino.
#3: Extreme weather events (lightning, hurricanes/cyclones, blizzards, tornadoes)
Forecasting for lightning and thunderstorms has improved quite a bit over the last century, and falls squarely within Category #3. In The Signal and the Noise, Nate Silver notes that the probability of an American dying from lightning has dropped from 1 in 400,000 in 1940 to 1 in 11,000,000 today, and a large part of the credit goes to better weather forecasting causing people to avoid the outdoors at the times and places that lightning might strike.
Forecasting for hurricanes and cyclones (which are the same weather phenomenon, just at different latitudes) is quite good, and getting better. It falls squarely in category #3: in addition to having general probability estimates of the likelihood of particular types of extreme weather events, we can forecast them a day or a few days in advance, allowing for preparation and minimization of negative impact.
The precision for forecasting the eye of the storm has increased about 3.5-fold in length terms (so about 12-fold in area terms) over the last 25 years. Nate Silver notes that 25 years ago, the National Hurricane Center’s forecasts for where a hurricane would hit on landfall, made three days in advance, were 350 miles off on average. Now they’re about 100 miles off on average. Most of the major hurricanes that hit the United States, and many other parts of the world, were forecast well in advance, and people even made preparations (for instance, by declaring holidays, or stocking up on goods). Blizzard forecasting is also fairly impressive: I was at Chicago in 2011 when a blizzard hit, and it had been forecast at least a day in advance. With tornadoes, tornado warning alerts are often issued, albeit the tornado often doesn’t actually touch down even after the alert is issued (fortunately for us).
See also my posts on weather forecasting and climate forecasting.
#4: Major terrorist acts
Terrorist attacks are interesting. It has been claimed that the frequency-damage relationship for terrorist attacks follows a power law. The academic paper that popularized this observation is a paper by Aaron Clauset, Maxwell Young and Kristian Gleditsch titled “On the Frequency of Severe Terrorist Attacks” (Journal of Conflict Resolution 51(1), 58 − 88 (2007)), here. Bruce Schneier wrote a blog post about a later paper by Clauset and Frederick W. Wiegel, and see also more discussion here, here, here, and here (I didn’t select these links through a very discerning process; I just picked the top results of a Google Search).
Silver’s book does allude to power laws for terrorism, but I couldn’t find any reference to Clauset in his book (oops, seems like my Kindle search was buggy!) and says the following about Clauset:
So terrorist attacks are at least in category 1. What about categories 2 and 3? Can we forecast terrorist attacks the way we can forecast volcanoes, or the way we can forecast hurricanes. One difference between terrorist acts and the “acts of God” discussed so far is that to the extent one has inside information about a terrorist attack that’s good enough to predict it with high accuracy, it’s usually also sufficient to actually prevent the terrorist attack. So Category 3 becomes trickier to define. Should we count the numerous foiled terrorist plots as evidence that terrorist acts can be successfully “predicted” or should we only consider successful terrorist acts in the denominator? And another complication is that terrorist acts are responsive to geopolitical decisions in ways that earthquakes are definitely not, with extreme weather events falling somewhere in between.
As for Category 2, the evidence is unclear, but it’s highly likely that terrorist acts can be forecast in a time-dependent fashion to quite a degree. If you want to crunch the numbers yourself, the Global Terrorism Database (website, Wikipedia) and Suicide Attack Database (website, Wikipedia) are available for you to use. I discussed some general issues with political and conflict forecasting in my earlier post on the subject.
UPDATE: Clauset emailed me with some corrections to this section of the post, which I have made. He also pointed to a recent paper he co-wrote with Ryan Woodward about estimating the historical and future probabilities of terror events, available on the ArXiV. Here’s the abstract:
#5: Power outages
Power outages could have many causes. Note that insofar as we can forecast the phenomena underlying the causes, this can be used to reduce, rather than simply forecast, power outages.
Poor load forecasting, i.e., electricity companies don’t forecast how much demand there will be and don’t prepare supplies adequately. This is less of an issue in developed countries, where the power systems are more redundant (at some cost to efficiency): Note here that the power outage occurs due to a failure of a more mundane forecasting exercise. Forecasting the frequency of power outages due to this cause is basically an exercise in calibrating the quality of the mundane forecasting exercise.
Abrupt or significant shortages in fuel, often for geopolitical reasons. This therefore ties in with the general exercise of geopolitical forecasting (see my earlier post on the subject). This seems rare in the modern world, due to the considerably redundancy built into global fuel supplies.
Disruption of power lines or power supply units due to weather events. The most common causes appear to be lightning, ice, wind, rain, and flooding. This ties in with #3, and with my weather forecasting and climate forecasting posts. This is the most common cause of power outages in developed countries with advanced electricity grids (see, for instance, here and here).
Disruption by human or animal activity, including car accidents and animals climbing onto and playing with the power lines.
Perhaps the most niche source of power outages, that many people may be unaware of, is geomagnetic storms (Wikipedia). These are quite rare, but can result in major power blackouts. Geomagnetic storms were discussed in past MIRI posts (here and here). Geomagnetic storms are low-frequency and low-probability events but with potentially severe negative impact.
My impression is that when it comes to power outages, we are at Category 2 in forecasting. Load forecasting can identify seasons, times of the day, and special occasions when power demand will be high. Note that the infrastructure needs to built for peak capacity.
We can’t quite be in Category 3, because in cases where we can forecast more finely, we could probably prevent the outage anyway.
What sort of preventive measures do people undertake with knowledge of the frequency of power outages? In places where power outages are more likely, people are more likely to have backup generators. People may be more likely to use battery-powered devices. If you know that a power outage is likely to happen in the next few days, you might take more care to charge the batteries on your devices.
#6: Server outages
In our increasingly connected world, websites going down can have a huge effect on the functioning of the Internet and of the world economy. As with power infrastructure, the complexity of server infrastructure needed to increase uptime increases very quickly. The point is that routing around failures at different points in the infrastructure requires redundancy. For instance, if any one server fails 10% of the time, and the failures of different components are independent, you’d need two servers to get to a 1% failure rate. But in practice, the failures aren’t independent. For instance, having loads of servers in a single datacenter covers the risk of any given server there crashing, but it doesn’t cover the risk of the datacenter itself getting disconnected (e.g., losing electricity, or getting disconnected from the Internet, or catching fire). So now we need multiple datacenters. But multiple datacenters are far from each other, so that increases the time costs of synchronization. And so on. For more detailed discussions of the issues, see here and here.
My impression is that server outages are largely Category 1: we can use the probability of outages to determine the trade-off between the cost of having redundant infrastructure and the benefit of more uptime. There is an element of Category 2: in some cases, we have knowledge that traffic will be higher at specific times, and additional infrastructure can be brought to bear for those times. As with power infrastructure, server infrastructure needs to be built to handle peak capacity.
#7: Financial crises
The forecasting of financial crises is a topic worthy of its own post. As with climate science, financial crisis forecasting has the potential for heavy politicization, given the huge stakes both of forecasting financial crises and of any remedial or preventative measures that may be undertaken. In fact, the politicization and ideology problem is probably substantially worse in financial crisis forecasting. At the same time, real-world feedback occurs faster, providing more opportunity for people to update their beliefs and less scope for people getting away with sloppiness because their predictions take too long to evaluate.
A literally taken strong efficient market hypothesis (EMH) (Wikipedia) would suggest that financial crises are almost impossible to forecast, while a weaker reading of the EMH would suggest that the financial market is efficient (Wikipedia): it’s hard to make money off the business of forecasting financial crises (for instance, you may know that a financial crisis is imminent with high probability, but the element of uncertainty, particularly with regards to timing, can destroy your ability to leverage that information to make money). On the other hand, there are a lot of people, often subscribed to competing schools of economic thought, who successfully forecast the 2007-08 financial crisis, at least in broad strokes.
Note that there are people who reject the EMH, yet claim that financial crises are very hard to forecast in a time-dependent fashion. Among them is Nassim Nicholas Taleb, as described here. Interestingly, Taleb’s claim to fame appears to have been that he was able to forecast the 2007-08 financial crisis, albeit it was more of a time-independent forecast than a specific timed call. The irony was noted by by Jamie Whyte here in Standpoint Magazine.
I found a few sources of information for financial crises, that are discussed below.
Economic Predictions records predictions made by many prominent people and how they compared to what transpired. In particular, this page on their website notes how many of the top investors, economists, and bureaucrats missed the financial crisis, but also identifies some exceptions: Dean Baker, Med Jones, Peter Schiff, and Nouriel Roubini. The page also discusses other candidates who claim to have forecasted the crisis in advance, and reasons why they were not included. While I think they’ve put in a fair deal of effort into their project, I didn’t see good evidence that they have a strong grasp of the underlying fundamental issues they are discussing.
An insightful general overview of the financial crisis is found in Chapter 1 of Nate Silver’s The Signal and the Noise, a book that I recommend you read in its entirety. Silver notes four levels of forecasting failure.
Silver finds a common thread among all the failures (emphases in original):
While I find Silver’s analysis plausible and generally convincing, I don’t think I have enough of an inside-view understanding of the issue.
A few other resources that I found, but didn’t get a chance to investigate, are listed below:
Forecasting Financial Crisis website includes many publications related to financial crisis forecasting and prevention.
How to forecast a financial crisis (article in Financial Times).
Forecasting the Global Financial Crisis, a working paper by Daniela Bragoli.
#8: Pandemics
I haven’t investigated this thoroughly, but here are a few of my impressions and findings:
I think that pandemics stand in relation to ordinary epidemiology in the same way that extreme weather events stand in relation to ordinary weather forecasting. In both cases, the main way we can get better at forecasting the rare and high-impact events is by getting better across the board. There is a difference that makes the relation between moderate disease outbreaks and pandemics even more important than the corresponding case for weather: measures taken quickly to react to local disease outbreaks can help prevent global pandemics.
Chapter 7 of Nate Silver’s The Signal and the Noise, titled “Role Models”, discusses forecasting and prediction in the domain of epidemiology. The goal of epidemiologists is to obtain predictive models that have a level of accuracy and precision similar to those used for the weather. However, the greater complexity of human behavior, as well as the self-fulfilling and self-canceling nature of various predictions, makes the modeling problem harder. Silver notes that agent-based modeling (Wikipedia) is one of the commonly used tools. Silver cites a few examples from recent history where people were overly alarmed about possible pandemics, when the reality turned out to be considerably milder. However, the precautions taken due to the alarm may still have saved lives. Silver talks in particular of the 1976 swine flu outbreak (where the reaction turned out be grossly disproportional to the problem, and caused its own unintended consequences) and the 2009 flu pandemic.
In recent years, Google Flu Trends (website, Wikipedia) has been a common technique in identifying and taking quick action against the flu. Essentially, Google uses the volume of web search for flu-related terms by geographic location to identify the incidence of the flu by geographic location. It offers an early “leading indicator” of flu incidence compared to official reports, that are published after a time lag. However, Google Flu Trends has run into problems of reliability: news stories about the flu might prompt people to search for flu-related terms, even if they aren’t experiencing symptoms of the flu. Or it may even be the case that Google’s own helpful search query completions get people to search for flu-related terms once other people start searching for the term. Tim Harford discusses the problems in the Financial Times here. I think Silver doesn’t discuss this (which is a surprise, since it would have fit well with the theme of his chapter).
#9: Near-earth object impacts
I haven’t looked into this category in sufficient detail. I’ll list below the articles I read.
Wikipedia pages on asteroid-impact avoidance, near-earth object, B612 Foundation, Minor Planet Center, NEOShield, Sentry (monitoring system), Spaceguard, and Chelyabinsk meteor.
GiveWell’s shallow overview of asteroid detection.
The hazard of near-earth asteroid impacts (PDF) by Clark R. Chapman (gated copy), Earth and Planetary Science Letters, Volume 222, Issue 1, 15 May 2004, Pages 1–15.
A 500-kiloton airburst over Chelyabinsk and an enhanced hazard from small impactors, published in Nature after the Chelyabinsk meteor.