On characterizing heavy-tailedness

[CONTEXT: For a while I have been meaning to engage with a literature review on heavy tailed distributions. Instead of just indefinitely postponing the project I resolved to write some preliminary thoughts on the topic, so I can get started on understanding the concept better with a less daunting task]

TL;DR: There are many formalizations of heavy-tailedness out there. I define five intuititive principles that I expect a good definition to satisfy: action-relevance, distinguish negative and positive risk, allow finite support, apply to empirically observed phenomena and provide a characterization in terms of a universal class of distributions. I discuss each in turn and provide examples.

Heavy-tailed distributions occur when extreme, low-probability yet plausible outcomes dominate decision-making.

For example, when considering how to contain a pandemic, an official will not want to focus on low-impact scenarios where eg the pandemic dies out on its own, nor on implausible scenarios where eg a solar flare messes up with our electronics during the crisis. Instead she will focus on scenarios where eg the pandemic grows out of control because its contagion rate is higher than expected—a plausible scenario that albeit unlikely is disastrous enough to warrant precaution.

Heavy-tailed distributions are an important object of study in cause prioritization—we should focus on studying such distributions to the extent that extreme outcomes dominate long term decision-making.

My informal impression is that the notion of heavy tail distributions has been heavily discussed among mathematicians, especially in the context of extreme value theory. However, there is no single agreed-upon formalization of the concept, making discussion and the application of the concept notoriously difficult.

Through this post, I will explore some important concepts around heavy-tailedness that we want an ideal definition to make precise.

My hope is that having this discussion will help us later productively discuss the strengths and weaknesses of different proposed definitions of heavy tailedness

In short, an ideal definition of heavy-tailedness would be action-relevant, able to distinguish risks from hits, be well-defined for distributions with finite support, describe natural phenomena and adscribe heavy-tailedness to a universal class of distributions.

Action relevance

We hope the definition of heavy tailedness to suggest a qualitatively different approach to statistical inference and decision-making.

For example, we would like heavy tailed distributions to simplify decision-making (eg, via a dominance result that recommends to never expose ourselves to heavy-tailed risks) or show the inadequacy of standard methods (eg, a result showing in a precise sense that historical data on a heavy-tailed distribution is not a good predictor of future performance).

To the extent that heavy-tailed distributions are already well-studied by standard methods we will be better off not introducing a new concept.

For example, a formalization of heavy-tailedness based around the notion of non-finite second order central moments (ie variance), or that implies non-finite second order central moments, would satisfy the criteria for action relevance, as it would imply that the mean of heavy-tailed distributed iid variables does not neccesarily converge to a normal—a common and load-bearing assumption in statistics.

Distinguishing left and right tails

Extreme outcomes take two forms: extreme negative outcomes (risks) and extreme positive outcomes (hits).

For example, a calamity such as drastic, unexpected, sudden climate change melting the poles and causing massive floods would be a risk. Meanwhile, an unexpected discovery of a cure against cancer would count as a hit.

In cause prioritization, we hope to expose ourselves to hits, while minimizing risks. Thus we want out discussion of heavy-tailedness to distinguish between both.

For example, a definition of heavy tailedness formalized as leptokurtic distributions is unsatisfying in this sense—there is no meaningful way to talk about right and left leptokursis.

However, the formalism of subexponential distributions easily allows to distinguish left and right fat-tailedness.

Allowing finite support

Reality is inherently bounded—I can confidently assert that there is no possible risk today that would endanger a trillion lives, because I am confident the number of people on the planet is well below that.

In statistics, we usually resort to distributions over unbounded possible outcomes to simplify matters. This is usually admissible, since most of the probability mass is contained in a sensible-enough finite region, and thus the probability mass assign to absurd outcomes can be treated as a rounding error.

However, when discussing heavy-tailed distributions, we are precisely studying the region of extreme outcomes. If our definition of heavy tailedness requires the distribution to have infinite support, we risk our analysis focusing on absurd outcomes.

All definitions of heavy tailedness that rely on asymptotic behaviour, such as the definition of power laws, do not allow finite support. In contrast, notions of heavy tailedness based on measures of inequality such as the Gini coefficient allow finite support.

Describing natural phenomena

Many everyday phenomena are documented to be distributed normally, including eg height, etc.

Similarly, if the notion of heavy-tailedness is to be useful, we would expect it to happen in many decision-relevant scenarios. Thus we would hope to identify many empirical distributions that conform to our definition of heavy tailedness.

This also suggests a different approach to formalizing the concept—instead of starting from the a priori requirements, we could work first on identifying heavy tailed distributions, and developing a useful language to study them by looking at particular cases.

Some such empirical distributions usually considered to be heavy tailed include Zips Law and Benford’s Law.

Universality

Normal distributions are heavily studied in statistics, because they occur as the limiting distribution that arises when you take the mean of iid variables of finite variance.

This corresponds to a theoretical reassurance than treating the mean of some unknown distributions that exhibit empirically finite variance as if it was a normal will be good enough for inference and decision-making.

Analogously, we would like our definition of heavy-tailedness to apply and adscribe heavy-tailedness to a general limiting class of distributions, so we can use it to study general distributions.

The universal class of distributions that comes up again and again when discussing heavy tail distributions are the Levy alpha stable distributions. Thus we would expect our definition to apply to this class, and to provide a characterization of heavy tailedness in function of the parameters of the class.

We have discussed some properties that we would like a good formalization of the concept of heavy-tailedness.

There are several paths we could take from here, including:

Refining the properties where possible, expanding them with more examples, contesting their desirability
Conducting a review of existing formalisms related to the concept of heavy-tailedness
Studying how the properties interact with each other, and hoping to shed light on a tentative definition—or an impossibility result
Collecting a sample of empirical and theoretical distributions commonly considered to be heavy-tailed, to reflect on what makes them heavy-tailed

The topic of heavy-tailedness is one that I have seen used and abused in many situations, and I think that developing a shared understanding of what it means in a precise sense will help us communicate better and make better decisions.

We cannot discard the possibility that this could be a dead research path—for example, our intuitive understanding of the topic might be good enough for decision making, the formalization may be beyond our current mathematics or the notion of heavy-tailedness might be misleading in the sense of not requiring a separate treatment from non-heavy-tailed distributions.

Nevertheless, I think that this is a research path worth exploring, and I would be keen on reading more on the topic. Let me know in the comments if you have further research ideas, clarifying concepts or questions of your own.

This blogpost was written by Jaime Sevilla, visiting researcher at the Center for the Study of Existential Risks, under a grant from the Effective Altruism Foundation. I’d like to thank Max Daniel and Ronja Lutz for conducting some preliminary research on the topic with me a while ago.