There are quite a few methods to convert rankings into numbers, such as Elo scores or item response theory. Different such methods generally yield extremely similar results, and are often based probabilities. For instance, in Chess, if two players’ ability differ by 400 Elo points, then the odds of the better player winning are 10:1.
Are these methods reasonable? Can they go wrong?
Quantified ordinals are ~log scales
If you look at the formula for Elo scores, it involves an exponential function, so each time the Elo changes by 400, the odds of winning changes by a factor of 10:1 (by definition). A similar principle applies to the probability of solving a difficult task as a function of IQ, if one uses item response theory.
But I think the most fundamental way to understand it is that these methods assume that measurement error is independent of the measured quantity, such that there is the same amount of error in the ranking of the best as in the ranking of the worst. In linear scales, measurement error is usually proportional to the measured quantity, so in order for it to be independent, one must take the logarithm.
Log scales need a base for addition
It might be tempting to think that a log scale is equivalent to a linear scale, since you can just take the exponential of it. However, there are many different exponential functions: 2^x, e^x, 10^x, ….
If n=a^x, and m=a^y, then nm=a^(x+y). So we can add log-scaled numbers, and this simply corresponds to multiplying their linearly-scaled values, even if we don’t know what base is most natural.
However, there is no corresponding expression for n+m. We can at best give some limiting values, e.g. as (y-x)ln(a) goes to infinity, n+m approaches a^y. We can use this limit to approximate n+m as a^max(x, y), but this approximation fails badly when you start summing up tons of values of similar size. (In practical terms, a tank plus a speck of dust is equal to a tank, but an apple and an orange is more than just an apple or just an orange.)
Addition lets you infer large things from an enumeration of small things
You need addition to even know whether there seems to be a problem worth performing inference on in the first place. For instance, imagine that the prices of various goods have changed. If you have some idea of living standards that must be achieved, you can add up the prices of the goods needed to achieve these living standards, in order to see if the cost of living has changed.
Could you have done this purely multiplicatively, by e.g. taking the geometric mean of the relative changes in each type of good? (Or equivalently, by averaging the changes in the logarithms of the prices?) No, because the price of a few big things (e.g. housing) might have gone one way, while the price of many small things might have gone the other way. A geometric mean of relative changes would ignore the magnitude of the prices, and instead mainly consider the number of goods.
Picking the right base might not be trivial
Picking the right base might seem trivial. For instance, the definition of Elo scores seems to imply that n=10^(x/400). Given this linearization, the linear scores are directly proportional to the probability of winning a match: in a game between a player with Elo x, and a player with Elo y, if we let n=10^(x/400) and m=10^(y/400), then the probability of the first player winning is n/(n+m).
This is relatively sensible, but the issue is that it is not compositional; for instance the scores will not be proportional to the probability of winning two matches. This probability would instead by (n/(n+m))^2 = n^2/(n^2 + m^2 + 2nm). Similarly, the Elo score presumably does not divide neatly when considering things like the probability of making a good move.
This lack of compositionality undermines the point of picking a base, because we wanted to pick a base in order to sum different things together.
[LDSL#6] When is quantification needed, and when is it hard?
This post is also available on my Substack.
In the previous post, I discussed the possibility of doing ordinal comparison between different entities. Yet usually when we think about measurement, we think of it as involving putting numbers to things, not just ranking them.
There are quite a few methods to convert rankings into numbers, such as Elo scores or item response theory. Different such methods generally yield extremely similar results, and are often based probabilities. For instance, in Chess, if two players’ ability differ by 400 Elo points, then the odds of the better player winning are 10:1.
Are these methods reasonable? Can they go wrong?
Quantified ordinals are ~log scales
If you look at the formula for Elo scores, it involves an exponential function, so each time the Elo changes by 400, the odds of winning changes by a factor of 10:1 (by definition). A similar principle applies to the probability of solving a difficult task as a function of IQ, if one uses item response theory.
Another way to see it is to look at the practical outcomes. For instance, IQ appears to be exponentially related to income.
But I think the most fundamental way to understand it is that these methods assume that measurement error is independent of the measured quantity, such that there is the same amount of error in the ranking of the best as in the ranking of the worst. In linear scales, measurement error is usually proportional to the measured quantity, so in order for it to be independent, one must take the logarithm.
Log scales need a base for addition
It might be tempting to think that a log scale is equivalent to a linear scale, since you can just take the exponential of it. However, there are many different exponential functions: 2^x, e^x, 10^x, ….
If n=a^x, and m=a^y, then nm=a^(x+y). So we can add log-scaled numbers, and this simply corresponds to multiplying their linearly-scaled values, even if we don’t know what base is most natural.
However, there is no corresponding expression for n+m. We can at best give some limiting values, e.g. as (y-x)ln(a) goes to infinity, n+m approaches a^y. We can use this limit to approximate n+m as a^max(x, y), but this approximation fails badly when you start summing up tons of values of similar size. (In practical terms, a tank plus a speck of dust is equal to a tank, but an apple and an orange is more than just an apple or just an orange.)
Addition lets you infer large things from an enumeration of small things
You need addition to even know whether there seems to be a problem worth performing inference on in the first place. For instance, imagine that the prices of various goods have changed. If you have some idea of living standards that must be achieved, you can add up the prices of the goods needed to achieve these living standards, in order to see if the cost of living has changed.
Could you have done this purely multiplicatively, by e.g. taking the geometric mean of the relative changes in each type of good? (Or equivalently, by averaging the changes in the logarithms of the prices?) No, because the price of a few big things (e.g. housing) might have gone one way, while the price of many small things might have gone the other way. A geometric mean of relative changes would ignore the magnitude of the prices, and instead mainly consider the number of goods.
Picking the right base might not be trivial
Picking the right base might seem trivial. For instance, the definition of Elo scores seems to imply that n=10^(x/400). Given this linearization, the linear scores are directly proportional to the probability of winning a match: in a game between a player with Elo x, and a player with Elo y, if we let n=10^(x/400) and m=10^(y/400), then the probability of the first player winning is n/(n+m).
This is relatively sensible, but the issue is that it is not compositional; for instance the scores will not be proportional to the probability of winning two matches. This probability would instead by (n/(n+m))^2 = n^2/(n^2 + m^2 + 2nm). Similarly, the Elo score presumably does not divide neatly when considering things like the probability of making a good move.
This lack of compositionality undermines the point of picking a base, because we wanted to pick a base in order to sum different things together.