If I understand Taleb correctly, his objection is that if X’s distribution’s upper tail tends to a power law with small enough (negated) exponent α, then sample proportions of X going to the distribution’s top end are inconsistent under aggregation, and suffer a bias that decreases with sample size. And since the Gini coefficient is such a measure, it has these problems.
However, Taleb & Douady give me the impression that the quantitative effect of these problems is substantial only when α is appreciably less than 2. (The sole graphical example for which T&D mention a specific α, their figure 1, uses α = 1.1). But I have a hard time seeing how α can really be that small for income & wealth, because that’d imply mean income & mean wealth aren’t well-defined in the population, which must be false because no one actually has, or is earning, infinitely many dollars or euros.
[Edit after E_N’s response: changed “a bias that rises with sample size” to “a bias that decreases with sample size”, I got that the wrong way round.]
But I have a hard time seeing how α can really be that small for income & wealth, because that’d imply mean income & mean wealth aren’t well-defined in the population,
Um no. They’re not well defined over the distribution, they will certainly be well defined over a finite population.
which must be false because no one actually has, or is earning, infinitely many dollars or euros.
You seem to be confused about how distributions with infinite means work. Here’s a good exercise: get some coins and flip them to obtain data in a St. Petersburg distribution notice that even though the distribution has infinite mean all your data points are still finite (and quite small).
Um no. They’re not well defined over the distribution, they will certainly be well defined over a finite population.
I’m lost. A statistical distribution characterizes a population (whether the population is an abstract construction or a literal concrete population); if the mean isn’t well-defined for the population it oughtn’t be well-defined for the distribution allegedly characterizing the population.
Taking annual income for concreteness, the support of a power law distribution would include, for example, $69 quadrillion. But no one actually earns so much (global economic activity, denominated in dollars, is simply too small), so the support of the actual annual income distribution must exclude $69 quadrillion. Consequently the actual annual income distribution and the power law distribution cannot actually be the same distribution; they have different support.
You seem to be confused about how distributions with infinite means work. Here’s a good exercise: get some coins and flip them to obtain data in a St. Petersburg distribution notice that even though the distribution has infinite mean all your data points are still finite (and quite small).
In the case of the St. Petersburg distribution one defines an abstract data-generating process which, by construction, implies a particular distribution with infinite mean. In the case of people’s incomes or wealth, by contrast, we know that the output of the data-generating process is constrained from above by the size of the economy, so the resulting population (and the distribution representing that population) must have finite mean income and finite mean wealth. (It’s as if we were talking about an imperfect real-life instantiation of the St. Petersburg process where we knew the casino had a limited amount of money.)
Consequently the actual annual income distribution and the power law distribution cannot actually be the same distribution; they have different support.
Every actual population differs from a parameterised mathematical function with few parameters, and for pretty much anything you can measure, if the mathematical distribution has infinite support, there will be some reason that the population cannot. But the question to ask is not, are they different, but, does the difference make a difference?
The way to answer this question is to repeat the analysis in the paper Eugine cited using a truncated power law. The bounds must be placed at the limits of what is possible, not at the accidental maximum and minimum values observed in the current population, as the point here is that the population is not fully exploring the tails.
I have not done this, but I did once do a simulation for the Cauchy distribution (which has no mean), finding empirically the standard deviation of the mean of samples of size N. Each individual set of N values has a mean, but they will be wildly different for different samples. Increasing N does not reduce the effect for any practical value of N (and I did this in Matlab, which is optimised for fast number-crunching on arrays). This is completely different from what happens for sample means drawn from distributions with finite mean and variance, whose means converge with increasing N to the population mean.
For my experiment with the Cauchy distribution, not a single one of my samples had to be rejected due to exceeding the limits of finite precision arithmetic. The absence of infinite tails from the samples made no difference to the experimental results, even though it is the presence of those infinite tails that gives the Cauchy distribution its lack of moments.
This may look like a paradox. You have two distributions, the Cauchy distribution and its truncation at 1e50 or wherever. The former has no moments, and the latter does. Yet the empirical behaviour of samples drawn from the latter agrees with mathematical analysis of the former, even though in the latter case the standard deviation of the sample mean must converge with increasing sample size to zero, and in the former case it remains infinite.
The resolution of this paradox lies in the fact that as the variance of a distribution that has a finite variance becomes larger and larger, the rate of convergence of sample means becomes slower and slower. For the Cauchy distribution truncated at +/- X and a sample size of N, for large X and N the variance of the sample mean is proportional to X/N. If we take the limit of this as X goes to infinity, we get infinity, independent of N. If we take the limit as N goes to infinity we get zero, independent of X. The behaviour found when both X and N are finite will depend on which is bigger. When X is very large, even the entire population (conceived as a sample from an underlying data-generation process) may not give a good estimate of the distribution mean.
Taleb and Douady’s point is that for a power law distribution, wealth owned by the top 1% is subject to this phenomenon. A larger population will explore more of the tail of the distribution, and unlike the normal distribution, the tail is fat enough to give a different value for the statistic. The “true” distribution does not have to actually have infinite support, for the entire population of a country to be insufficient to explore the tails.
The authors draw the implication that as both population and technological development grow, the top 1% will be found to have larger proportions of the wealth, not because of any change in the mechanisms of society to favour them, but because more of the sample space is being explored. “So examining times series, we can easily get a historical illusion of rise in wealth concentration when it has been there all along.” (Presumably one could quantify the effect and correct for it.)
A possibility that the paper does not raise is that instead of calculating the actual wealth held by the actual top 1%, you could estimate the Gini coefficient from the whole population, and calculate a theoretical 1% wealth. This may be substantially more. The authors suggest that Pareto’s empirical observation of the 80⁄20 rule, which implies 53% wealth held by the top 1%, might actually correspond to a figure of 70%.
This could be spun in opposite ways. If you want to boom freedom and boo levellers, you can point to this and say there’s always more room at the top. If you want to boom equality and boo the rich, you can say that the true situation is even worse that the 1% figure says, indeed that the figure is a systematic underestimate, a piece of evil propaganda used by the rich to conceal the true extent of the inequality inherent in the system.
Thanks for your detailed reply, which seems more responsive to me than Eugine_Nier’s.
Every actual population differs from a parameterised mathematical function with few parameters, and for pretty much anything you can measure, if the mathematical distribution has infinite support, there will be some reason that the population cannot. But the question to ask is not, are they different, but, does the difference make a difference?
Yes, just so.
The way to answer this question is to repeat the analysis in the paper Eugine cited using a truncated power law. The bounds must be placed at the limits of what is possible, not at the accidental maximum and minimum values observed in the current population, as the point here is that the population is not fully exploring the tails.
OK, so, I think this has helped me pinpoint the root of the disagreement here: we have different beliefs about what the relevant statistical population is. Here are four possibilities, in increasing order of expansiveness.
The statistical population is whatever big sample one can get one’s hands on.
The statistical population is the literal population of (working-age? adult?) people in a place of interest at a given time.
The statistical population is that implied by “a truncated power law”, with its bounds “placed at the limits of what is possible”.
The statistical population is that implied by a power law with only a lower bound.
(1) is what E_N apparently believes I endorse, even though it’s obviously a stupid choice based on confusing sample & population. I actually contend (2). You propose (3). T&D seem to assume (4).
In other circumstances one could simply elide the differences between these distributions, as there’d be no reason to expect the precise choice to matter much; the analysis being carried out would be insensitive to it. But here T&D’s analysis rests on a particular subtle, pathological feature, and the chance of that feature being present likely depends on which population one chooses.
Given that a choice should be made, I think (2) is a more natural, sensible choice than (3) & (4) for the purposes of estimating inequality at a given point in time, because (2) refers to the actually existing population of interest. (4) is a mathematical abstraction which might be a useful approximation in other circumstances, but risks spuriously introducing a pathological feature here. (3) better matches reality (having an upper bound) but is less parsimonious and harder to operationalize; how do we determine “the limits of what is possible”? And which upper limit ought one use — the maximum individual income/wealth that one’d witness if one could see into the pre-ordained future of humanity, or the maximum individual income/wealth possible under some counterfactual ensemble of futures, or...?
We might choose (3) or (4) in spite of these issues if we wished to predict the future course of inequality, because then we’d need to go beyond people who currently exist (and could simply have their incomes observed) and start modelling the distribution of incomes which haven’t been earned yet. But if we’d just like an index of current (or past) inequality, our interest is in the population of people who exist now (or existed at a given time in the past).
If God handed a data file with the income of everybody on the planet to an economist, and the economist used that to calculate some inequality index, the economist wouldn’t wring their hands about that index being a noisy sample statistic; they’d consider it the precise population value of the Gini coefficient (or whatever coefficient) for the world, and mark that job as done. (At least until they had to produce next year’s statistics, and needed fresh data!) In T&D’s notation, the number they’d come up with wouldn’t be κ-hat but κ. Complaints about κ-hat systematically diverging from κ would therefore be irrelevant.
Returning to what you wrote, yes, the way to answer the question is to repeat T&D’s analysis with a different distribution — but if I used a truncated power law it would be because it matched the empirical distribution of income/wealth well. (It would also be useful to see how κ-hat and κ differed under different sampling strategies; economists often deliberately try to oversample the upper end of the wealth distribution by using tax data or rich lists, and I’d expect that to lessen the small-sample bias T&D identify.)
OK, so, I think this has helped me pinpoint the root of the disagreement here: we have different beliefs about what the relevant statistical population is. Here are four possibilities, in increasing order of expansiveness.
1. The statistical population is whatever big sample one can get one’s hands on.
2. The statistical population is the literal population of (working-age? adult?) people in a place of interest at a given time.
3. The statistical population is that implied by “a truncated power law”, with its bounds “placed at the limits of what is possible”.
4. The statistical population is that implied by a power law with only a lower bound.
(1) is what E_N apparently believes I endorse, even though it’s obviously a stupid choice based on confusing sample & population. I actually contend (2). You propose (3). T&D seem to assume (4).
I introduced (3) to demonstrate that it has almost the same observational properties as (4). If the bounds are drawn widely enough that no member of a population of the given size drawn from the full distribution is likely to exceed the truncation, then that population has exactly the same properties as if drawn from the full distribution.
The proper concept of “population” is an important issue in statistics, the two main concepts being “all actually existing examples” (your (2)) and “all hypothetical examples that could be produced by the causal process responsible for creating the existing examples, in the proportions that they would be produced” ((3) and (4)). Each of these concepts has its uses, but both are called “the population”. This can produce confusion. There is no such thing as the “right” concept, only the concept that is relevant to whatever the context is.
I believe that some statisticians argue that the “hypothetical” concept of population is moonshine, but I don’t know if that view has a serious following. (I am not a statistician.) WWJS? (What Would Jaynes Say?)
If you just want to describe the existing population, e.g. finding what proportion of the wealth is currently owned by the currently richest 1% of the population, then you are talking about the first. If you want to make predictions about other populations, e.g. what proportion of the wealth will be owned by the top 1% if the population doubles and the Gini coefficient remains the same, then the hypothetical concept is involved: you have to calculate the expected value of the top centile on the basis that the new population is a random sample from the full distribution.
The practical point made by the Taleb and Douadi paper, put in terms of observations of actual populations, is that for a fixed value of the Gini coefficient, as the population grows, the actual top 1% will come to own an increasing fraction of the total wealth. This will happen not because of any change in the causal mechanisms by which people acquire differing wealth, but just because the population is larger and is statistically likely to explore more of the fat tail. Observing such an increase therefore cannot be used to argue that the rich are being increasingly favoured, unless a correction is first made to account for this effect.
As a footnote, I also note that the paper does not consider the issue of how accurately the power law fits the distribution of wealth, especially at its extremes. By definition, if the entire population of the country has only explored a certain way into the tail, it provides no evidence for how far the power law distribution continues beyond that point. One might justify the use of the power law as being some sort of maxent prior (I don’t know if it is), but that amounts to the same thing: a profession of maximal ignorance about the whole distribution, conditional on having observed the actual population, and one would have to be willing to update to a different law on discovering that the tail actually stopped. If one was doing science, and had a well-understood mechanism that could be demonstrated to yield a power law distribution, then one would have more solid grounds for applying it.
I agree with every paragraph there but one. Unfortunately the paragraph I disagree with is the important one.
The practical point made by the Taleb and Douadi paper, put in terms of observations of actual populations, is that for a fixed value of the Gini coefficient, as the population grows, the actual top 1% will come to own an increasing fraction of the total wealth. This will happen not because of any change in the causal mechanisms by which people acquire differing wealth, but just because the population is larger and is statistically likely to explore more of the fat tail. Observing such an increase therefore cannot be used to argue that the rich are being increasingly favoured, unless a correction is first made to account for this effect.
If that is the practical point made by T&D — and I’m not sure it is — it seems to me fallacious. Doesn’t it amount to saying, “the parameters baked into the underlying data-generating process we posit may be constant, so who cares that the actually existing level of inequality is increasing?” It confuses what I really care about (actual inequality) with the imperfect model (the power law) of what I really care about.
There’s also a risk of equivocating about what “the rich are being increasingly favoured” means. If it means “the power-law-producing process which, by assumption, decides people’s incomes & wealth is itself changing over time to favour the rich”, then that would be falsified by the underlying power law being constant over time. If it actually means instead e.g. “the economy will distribute a growing proportion of its output to the rich” or “underlying causal economic processes enable positive feedback loops that engender Matthew effects for income and/or wealth”, then it is quite consistent with the underlying power law being constant over time.
If that is the practical point made by T&D — and I’m not sure it is — it seems to me fallacious. Doesn’t it amount to saying, “the parameters baked into the underlying data-generating process we posit may be constant, so who cares that the actually existing level of inequality is increasing?”
Taleb is not talking about what one might care about, but pointing out some mathematical facts about fat-tailed distributions (distributions with a power-law tail) and quantiles. The population top centile, he shows, can be a substantially biased underestimate of the distribution top centile, even for very large populations (100 million).
Among the consequences of this is that if you take ten countries, each with the same value for the top centile, and aggregate them, the top centile of the combined population may be substantially larger. In that situation, what is the “real” fraction of wealth of the top 1%? The value for any individual country, or the value for the aggregate? Would the answer depend on whether the countries were all economically isolated from each other?
It confuses what I really care about (actual inequality) with the imperfect model (the power law) of what I really care about.
I’ve never understood what people find so objectionable about inequality. Poverty, yes, inequality, no. To me, what is wrong with some people being rich while others starve is only that some starve. It bothers me not at all that among those who are not poor, some are vastly wealthier than others, even though I am not one of them.
If your objection to the richness of the rich is a claim that it is causally responsible for the poorness of the poor, then you are interested in underlying mechanisms described by the distribution.
Taleb is not talking about what one might care about, but pointing out some mathematical facts about fat-tailed distributions (distributions with a power-law tail) and quantiles.
In that situation, what is the “real” fraction of wealth of the top 1%? The value for any individual country, or the value for the aggregate?
Whichever “is relevant to whatever the context is”. If the actual region of interest were the aggregate, the relevant value would be that for the aggregate.
If your objection to the richness of the rich is a claim that it is causally responsible for the poorness of the poor, then you are interested in underlying mechanisms described by the distribution.
The mechanisms entail a particular distribution, or family of distributions, and in this case the mechanisms are underspecified given the distribution. (Hence my pointing out that the phrase “the rich are being increasingly favoured” “is quite consistent with the underlying power law being constant over time”.)
If your objection to the richness of the rich is a claim that it is causally responsible for the poorness of the poor, then you are interested in underlying mechanisms described by the distribution.
The mechanisms entail a particular distribution, or family of distributions, and in this case the mechanisms are underspecified given the distribution. (Hence my pointing out that the phrase “the rich are being increasingly favoured” “is quite consistent with the underlying power law being constant over time”.)
“Increasingly favoured” is a statement about mechanisms. If the mechanisms remain the same as the population increases, then every individual who ends up rich is still working within the same mechanisms. It’s just that when the population increases, chance alone, operating on the same mechanisms, will produce many richer individuals.
Technically, the population top centile is a biased underestimate of the distribution value even for a thin-tailed distribution, but for those distributions the bias is unmeasurably small with country-sized populations.
“Increasingly favoured” is a statement about mechanisms.
When you personally state it, or in general when it’s stated by anyone? If the former, fair enough. If the latter, I disagree (cf. the last paragraph of an earlier comment). As a down-to-earth example, if I lost a game of Monopoly to you and afterwards ruefully remarked that “the dice were favouring you more & more towards the end”, I would not automatically be accusing of having swapped the original dice for a different set of dice partway through the game.
If the mechanisms remain the same as the population increases, then every individual who ends up rich is still working within the same mechanisms. It’s just that when the population increases, chance alone, operating on the same mechanisms, will produce many richer individuals.
Right (accepting arguendo the premise that the relevant mechanisms imply a power law-like distribution with α appreciably less than 2). And in that situation one could react as T&D might, i.e. by arguing that since the mechanisms remain the same, the resulting increase in equality is chimerical. Alternatively, one could react as I would, i.e. by arguing that an increase in some quantitative property of a concrete group of people does not magically become chimerical just because it arises from immutable mechanisms. (This is essentially the disagreement I laid out before between definitions (3) & (4) and definition (2) of the relevant statistical population.)
“Increasingly favoured” is a statement about mechanisms.
When you personally state it, or in general when it’s stated by anyone?
Ok, those words could be used for either meaning—to the confusion of discussion. T&D do say:
So examining times series, we can easily get a historical illusion of rise in wealth concentration when it has been there all along.
which could also be read in either sense. But the population centile and the mechanisms whereby people get rich are what they are. They are both real things, the divergence of which does not make either less real than the other.
A possibility that the paper does not raise is that instead of calculating the actual wealth held by the actual top 1%, you could estimate the Gini coefficient from the whole population, and calculate a theoretical 1% wealth.
Taleb would probably object on the grounds that the above will lead misleading results if the population is actually composed of a supper position of several distinct populations with different Gini coefficients.
Here is Taleb’s paper about the problems with measures like the Gini Coefficient.
If I understand Taleb correctly, his objection is that if X’s distribution’s upper tail tends to a power law with small enough (negated) exponent α, then sample proportions of X going to the distribution’s top end are inconsistent under aggregation, and suffer a bias that decreases with sample size. And since the Gini coefficient is such a measure, it has these problems.
However, Taleb & Douady give me the impression that the quantitative effect of these problems is substantial only when α is appreciably less than 2. (The sole graphical example for which T&D mention a specific α, their figure 1, uses α = 1.1). But I have a hard time seeing how α can really be that small for income & wealth, because that’d imply mean income & mean wealth aren’t well-defined in the population, which must be false because no one actually has, or is earning, infinitely many dollars or euros.
[Edit after E_N’s response: changed “a bias that rises with sample size” to “a bias that decreases with sample size”, I got that the wrong way round.]
Um no. They’re not well defined over the distribution, they will certainly be well defined over a finite population.
You seem to be confused about how distributions with infinite means work. Here’s a good exercise: get some coins and flip them to obtain data in a St. Petersburg distribution notice that even though the distribution has infinite mean all your data points are still finite (and quite small).
I’m lost. A statistical distribution characterizes a population (whether the population is an abstract construction or a literal concrete population); if the mean isn’t well-defined for the population it oughtn’t be well-defined for the distribution allegedly characterizing the population.
Taking annual income for concreteness, the support of a power law distribution would include, for example, $69 quadrillion. But no one actually earns so much (global economic activity, denominated in dollars, is simply too small), so the support of the actual annual income distribution must exclude $69 quadrillion. Consequently the actual annual income distribution and the power law distribution cannot actually be the same distribution; they have different support.
In the case of the St. Petersburg distribution one defines an abstract data-generating process which, by construction, implies a particular distribution with infinite mean. In the case of people’s incomes or wealth, by contrast, we know that the output of the data-generating process is constrained from above by the size of the economy, so the resulting population (and the distribution representing that population) must have finite mean income and finite mean wealth. (It’s as if we were talking about an imperfect real-life instantiation of the St. Petersburg process where we knew the casino had a limited amount of money.)
Every actual population differs from a parameterised mathematical function with few parameters, and for pretty much anything you can measure, if the mathematical distribution has infinite support, there will be some reason that the population cannot. But the question to ask is not, are they different, but, does the difference make a difference?
The way to answer this question is to repeat the analysis in the paper Eugine cited using a truncated power law. The bounds must be placed at the limits of what is possible, not at the accidental maximum and minimum values observed in the current population, as the point here is that the population is not fully exploring the tails.
I have not done this, but I did once do a simulation for the Cauchy distribution (which has no mean), finding empirically the standard deviation of the mean of samples of size N. Each individual set of N values has a mean, but they will be wildly different for different samples. Increasing N does not reduce the effect for any practical value of N (and I did this in Matlab, which is optimised for fast number-crunching on arrays). This is completely different from what happens for sample means drawn from distributions with finite mean and variance, whose means converge with increasing N to the population mean.
For my experiment with the Cauchy distribution, not a single one of my samples had to be rejected due to exceeding the limits of finite precision arithmetic. The absence of infinite tails from the samples made no difference to the experimental results, even though it is the presence of those infinite tails that gives the Cauchy distribution its lack of moments.
This may look like a paradox. You have two distributions, the Cauchy distribution and its truncation at 1e50 or wherever. The former has no moments, and the latter does. Yet the empirical behaviour of samples drawn from the latter agrees with mathematical analysis of the former, even though in the latter case the standard deviation of the sample mean must converge with increasing sample size to zero, and in the former case it remains infinite.
The resolution of this paradox lies in the fact that as the variance of a distribution that has a finite variance becomes larger and larger, the rate of convergence of sample means becomes slower and slower. For the Cauchy distribution truncated at +/- X and a sample size of N, for large X and N the variance of the sample mean is proportional to X/N. If we take the limit of this as X goes to infinity, we get infinity, independent of N. If we take the limit as N goes to infinity we get zero, independent of X. The behaviour found when both X and N are finite will depend on which is bigger. When X is very large, even the entire population (conceived as a sample from an underlying data-generation process) may not give a good estimate of the distribution mean.
Taleb and Douady’s point is that for a power law distribution, wealth owned by the top 1% is subject to this phenomenon. A larger population will explore more of the tail of the distribution, and unlike the normal distribution, the tail is fat enough to give a different value for the statistic. The “true” distribution does not have to actually have infinite support, for the entire population of a country to be insufficient to explore the tails.
The authors draw the implication that as both population and technological development grow, the top 1% will be found to have larger proportions of the wealth, not because of any change in the mechanisms of society to favour them, but because more of the sample space is being explored. “So examining times series, we can easily get a historical illusion of rise in wealth concentration when it has been there all along.” (Presumably one could quantify the effect and correct for it.)
A possibility that the paper does not raise is that instead of calculating the actual wealth held by the actual top 1%, you could estimate the Gini coefficient from the whole population, and calculate a theoretical 1% wealth. This may be substantially more. The authors suggest that Pareto’s empirical observation of the 80⁄20 rule, which implies 53% wealth held by the top 1%, might actually correspond to a figure of 70%.
This could be spun in opposite ways. If you want to boom freedom and boo levellers, you can point to this and say there’s always more room at the top. If you want to boom equality and boo the rich, you can say that the true situation is even worse that the 1% figure says, indeed that the figure is a systematic underestimate, a piece of evil propaganda used by the rich to conceal the true extent of the inequality inherent in the system.
Take your pick.
Thanks for your detailed reply, which seems more responsive to me than Eugine_Nier’s.
Yes, just so.
OK, so, I think this has helped me pinpoint the root of the disagreement here: we have different beliefs about what the relevant statistical population is. Here are four possibilities, in increasing order of expansiveness.
The statistical population is whatever big sample one can get one’s hands on.
The statistical population is the literal population of (working-age? adult?) people in a place of interest at a given time.
The statistical population is that implied by “a truncated power law”, with its bounds “placed at the limits of what is possible”.
The statistical population is that implied by a power law with only a lower bound.
(1) is what E_N apparently believes I endorse, even though it’s obviously a stupid choice based on confusing sample & population. I actually contend (2). You propose (3). T&D seem to assume (4).
In other circumstances one could simply elide the differences between these distributions, as there’d be no reason to expect the precise choice to matter much; the analysis being carried out would be insensitive to it. But here T&D’s analysis rests on a particular subtle, pathological feature, and the chance of that feature being present likely depends on which population one chooses.
Given that a choice should be made, I think (2) is a more natural, sensible choice than (3) & (4) for the purposes of estimating inequality at a given point in time, because (2) refers to the actually existing population of interest. (4) is a mathematical abstraction which might be a useful approximation in other circumstances, but risks spuriously introducing a pathological feature here. (3) better matches reality (having an upper bound) but is less parsimonious and harder to operationalize; how do we determine “the limits of what is possible”? And which upper limit ought one use — the maximum individual income/wealth that one’d witness if one could see into the pre-ordained future of humanity, or the maximum individual income/wealth possible under some counterfactual ensemble of futures, or...?
We might choose (3) or (4) in spite of these issues if we wished to predict the future course of inequality, because then we’d need to go beyond people who currently exist (and could simply have their incomes observed) and start modelling the distribution of incomes which haven’t been earned yet. But if we’d just like an index of current (or past) inequality, our interest is in the population of people who exist now (or existed at a given time in the past).
If God handed a data file with the income of everybody on the planet to an economist, and the economist used that to calculate some inequality index, the economist wouldn’t wring their hands about that index being a noisy sample statistic; they’d consider it the precise population value of the Gini coefficient (or whatever coefficient) for the world, and mark that job as done. (At least until they had to produce next year’s statistics, and needed fresh data!) In T&D’s notation, the number they’d come up with wouldn’t be κ-hat but κ. Complaints about κ-hat systematically diverging from κ would therefore be irrelevant.
Returning to what you wrote, yes, the way to answer the question is to repeat T&D’s analysis with a different distribution — but if I used a truncated power law it would be because it matched the empirical distribution of income/wealth well. (It would also be useful to see how κ-hat and κ differed under different sampling strategies; economists often deliberately try to oversample the upper end of the wealth distribution by using tax data or rich lists, and I’d expect that to lessen the small-sample bias T&D identify.)
I introduced (3) to demonstrate that it has almost the same observational properties as (4). If the bounds are drawn widely enough that no member of a population of the given size drawn from the full distribution is likely to exceed the truncation, then that population has exactly the same properties as if drawn from the full distribution.
The proper concept of “population” is an important issue in statistics, the two main concepts being “all actually existing examples” (your (2)) and “all hypothetical examples that could be produced by the causal process responsible for creating the existing examples, in the proportions that they would be produced” ((3) and (4)). Each of these concepts has its uses, but both are called “the population”. This can produce confusion. There is no such thing as the “right” concept, only the concept that is relevant to whatever the context is.
I believe that some statisticians argue that the “hypothetical” concept of population is moonshine, but I don’t know if that view has a serious following. (I am not a statistician.) WWJS? (What Would Jaynes Say?)
If you just want to describe the existing population, e.g. finding what proportion of the wealth is currently owned by the currently richest 1% of the population, then you are talking about the first. If you want to make predictions about other populations, e.g. what proportion of the wealth will be owned by the top 1% if the population doubles and the Gini coefficient remains the same, then the hypothetical concept is involved: you have to calculate the expected value of the top centile on the basis that the new population is a random sample from the full distribution.
The practical point made by the Taleb and Douadi paper, put in terms of observations of actual populations, is that for a fixed value of the Gini coefficient, as the population grows, the actual top 1% will come to own an increasing fraction of the total wealth. This will happen not because of any change in the causal mechanisms by which people acquire differing wealth, but just because the population is larger and is statistically likely to explore more of the fat tail. Observing such an increase therefore cannot be used to argue that the rich are being increasingly favoured, unless a correction is first made to account for this effect.
As a footnote, I also note that the paper does not consider the issue of how accurately the power law fits the distribution of wealth, especially at its extremes. By definition, if the entire population of the country has only explored a certain way into the tail, it provides no evidence for how far the power law distribution continues beyond that point. One might justify the use of the power law as being some sort of maxent prior (I don’t know if it is), but that amounts to the same thing: a profession of maximal ignorance about the whole distribution, conditional on having observed the actual population, and one would have to be willing to update to a different law on discovering that the tail actually stopped. If one was doing science, and had a well-understood mechanism that could be demonstrated to yield a power law distribution, then one would have more solid grounds for applying it.
I agree with every paragraph there but one. Unfortunately the paragraph I disagree with is the important one.
If that is the practical point made by T&D — and I’m not sure it is — it seems to me fallacious. Doesn’t it amount to saying, “the parameters baked into the underlying data-generating process we posit may be constant, so who cares that the actually existing level of inequality is increasing?” It confuses what I really care about (actual inequality) with the imperfect model (the power law) of what I really care about.
There’s also a risk of equivocating about what “the rich are being increasingly favoured” means. If it means “the power-law-producing process which, by assumption, decides people’s incomes & wealth is itself changing over time to favour the rich”, then that would be falsified by the underlying power law being constant over time. If it actually means instead e.g. “the economy will distribute a growing proportion of its output to the rich” or “underlying causal economic processes enable positive feedback loops that engender Matthew effects for income and/or wealth”, then it is quite consistent with the underlying power law being constant over time.
Taleb is not talking about what one might care about, but pointing out some mathematical facts about fat-tailed distributions (distributions with a power-law tail) and quantiles. The population top centile, he shows, can be a substantially biased underestimate of the distribution top centile, even for very large populations (100 million).
Among the consequences of this is that if you take ten countries, each with the same value for the top centile, and aggregate them, the top centile of the combined population may be substantially larger. In that situation, what is the “real” fraction of wealth of the top 1%? The value for any individual country, or the value for the aggregate? Would the answer depend on whether the countries were all economically isolated from each other?
I’ve never understood what people find so objectionable about inequality. Poverty, yes, inequality, no. To me, what is wrong with some people being rich while others starve is only that some starve. It bothers me not at all that among those who are not poor, some are vastly wealthier than others, even though I am not one of them.
If your objection to the richness of the rich is a claim that it is causally responsible for the poorness of the poor, then you are interested in underlying mechanisms described by the distribution.
The practical import of those mathematical facts lies ultimately in their relevance to features of the concrete world. If they do not bear on features of the concrete world which you or I care about, then they may have intrinsic beauty as mathematical results, but it would be mere misleading mathsturbation to present them as practically significant, as T&D do.
Whichever “is relevant to whatever the context is”. If the actual region of interest were the aggregate, the relevant value would be that for the aggregate.
The mechanisms entail a particular distribution, or family of distributions, and in this case the mechanisms are underspecified given the distribution. (Hence my pointing out that the phrase “the rich are being increasingly favoured” “is quite consistent with the underlying power law being constant over time”.)
“Increasingly favoured” is a statement about mechanisms. If the mechanisms remain the same as the population increases, then every individual who ends up rich is still working within the same mechanisms. It’s just that when the population increases, chance alone, operating on the same mechanisms, will produce many richer individuals.
Technically, the population top centile is a biased underestimate of the distribution value even for a thin-tailed distribution, but for those distributions the bias is unmeasurably small with country-sized populations.
When you personally state it, or in general when it’s stated by anyone? If the former, fair enough. If the latter, I disagree (cf. the last paragraph of an earlier comment). As a down-to-earth example, if I lost a game of Monopoly to you and afterwards ruefully remarked that “the dice were favouring you more & more towards the end”, I would not automatically be accusing of having swapped the original dice for a different set of dice partway through the game.
Right (accepting arguendo the premise that the relevant mechanisms imply a power law-like distribution with α appreciably less than 2). And in that situation one could react as T&D might, i.e. by arguing that since the mechanisms remain the same, the resulting increase in equality is chimerical. Alternatively, one could react as I would, i.e. by arguing that an increase in some quantitative property of a concrete group of people does not magically become chimerical just because it arises from immutable mechanisms. (This is essentially the disagreement I laid out before between definitions (3) & (4) and definition (2) of the relevant statistical population.)
Ok, those words could be used for either meaning—to the confusion of discussion. T&D do say:
which could also be read in either sense. But the population centile and the mechanisms whereby people get rich are what they are. They are both real things, the divergence of which does not make either less real than the other.
Taleb would probably object on the grounds that the above will lead misleading results if the population is actually composed of a supper position of several distinct populations with different Gini coefficients.
His paper does go into these and other elaborations of the basic point.