A linearity fallacy is committed when you incorrectly assume two variables are related by a linear relationship. For example, assuming that adding people to a project decreases the time to finish proportionally[1] or that happiness scales linearly with income. This kind of fallacy, where you mistakenly draw an x-axis and a y-axis and a straight line going from the bottom-left to the top-right is a linearity fallacy that happens all the time.
But there’s more. One day, you’re walking along, and you decide to take two measurements of some property at different instances and get two real numbers, x and y. Numbers are numbers, right? Wait, before you go on, you should know that real numbers are powerful objects that rely on many hidden assumptions such as… wait, did you just average the two numbers?
Average Speed
“If a car’s speed is 30 mph for 1 mile and 60 mph for the next, what was the average speed over the two miles?”
If you simply average the two numbers given, you arrive at an intuitive, but incorrect 45 mph. Instead, the first mile takes 2 minutes, and the second takes 1 minute, so 2 miles in 3 minutes is 40 mph. The average speed (defined as total distance over total time) is not actually a linear quantity with respect to each mile. It is a linear quantity with respect to each minute. The speed for each minute is 30 mph, 30 mph, 60 mph with a total average of 40 mph. For miles, you want the inverse speed: the average of 1⁄30 and 1⁄60 is, conveniently, 1⁄40.
For this problem, the speed metric is warped, whereas the inverse speed is straight and even. It’s like you looked at a mirror, plotted two points on the mirror, and took the midpoint… but the mirror was a funhouse mirror. On the speed number line, 30 to 45 to 60 looks evenly spaced, but 45 is actually closer to 60 than it is to 30. For inverse speed, 1 unit is 1 unit everywhere. On warped number lines, points are not evenly spaced, and on warped mirrors, lines don’t even have to be straight. Warped mirrors are everywhere, and they look like real ones.
How is this a linearity fallacy?
A linear map[2] satisfies two properties (addition and scalar multiplication), which can be combined into just f(ax+by)=af(x)+bf(y). If we label the two miles as Mand N, the whole road as R, and the speed function as f, then the intuitive but incorrect computation is
f(R)=f(M/2+N/2)=(f(M)+f(N))/2
Explicitly, the two-dimensional[3] input variable being road-type with M=(1,0), N=(0,1), R=(0.5,0.5), and the output variable being speed. A multi-dimensional linearity fallacy!
Linearity Assumptions
Linearity is a strong assumption. But within that, there are many weaker assumptions. Once you measure something and obtain numbers, you can compare, average, add, subtract, multiply, divide, and all sorts of other things. Here are a few common weaker linearity assumptions that are not as obvious.
Note: it is also common to commit a “correlation implies causation” fallacy whilst assuming linearity, but linearity fallacies cannot simply be reduced to mistaking correlation for causation. And as always, simply making an assumption does not necessarily make it wrong nor does it being wrong necessarily make all conclusions false.
Averaging
The averaging assumption is when you assume that you can average numbers. This is presumably, to then do something with those numbers, or else why average them? These all have an averaging assumption:
You would pay any finite amount of money to play a coin-flip game where the payout starts at $1 and doubles for each head in a row you flip.[4]
You are willing to take a 2.5x or nothing bet at any bet size, because it’s +EV.[5]
You run a round-robin tournament, and pick a winner based on average win rate.
You give students a test with 100 questions, and grade them based on accuracy.
You take different ML models, and score them using accuracy on n questions (yes, even drawn IID). Or you take the average score across n different tasks.[6]
You allow users to score X on a 10-point rating scale and display the average rating.
You conclude that the X is the best chess opening, because it has the highest win rate against most opponents.
You run an experiment on 100 athletes and show a large average improvement in some important metric.
You compromise in some disagreement by meeting in the middle.
Sometimes, linear averaging may be violated when your numbers are on the wrong scale. Why speed and not inverse-speed, or wealth and not log-wealth, or any other such transformed scale? It may also be violated when items being averaged are not alike. Does every question, every task, every opponent, or every win matter the same?
Monotonicity
You assume monotonicity when you assume that things are roughly ranked by the metric you choose. Some examples:
More people in a project (or, cooks in the kitchen[7]) makes the end product faster or better.
More hours worked in a week means you get more done.
Money buys happiness.
In many cases, the relationship is actually an inverse-U shape. In that case, there is some optimal amount of the input quantity, and it can be much lower than you think. Some relationships, like between money and happiness, are likely actually monotonic (roughly, on average, at the population level, and across scales you care about) despite not being linear in the metric you chose. Happiness correlates with income at the population level, just not linearly, but it does with log-income across large scales[8].
Summing
The summing assumption is when you assume you can add numbers from different sources and use these a point of comparison. For example:
You are impressed by a company’s collective 100 years of total experience.
You assume you are winning in a chess game because you have a material advantage on the board.
Summing differs from averaging when the number of items being summed is not constant.
Distance
The distance assumption is when you assume you can take and compare absolute distances. For example:
You think method X is more impressive it boosted accuracy by 10 percentage points.
You assume that the last 10% of the project will also take 10% of the time.
You are more swayed by an improvement in MPG from 30 to 40 than 10 to 11.
Ratio
The ratio assumption is when you assume you can take and compare relative distances. For example:
You are worried because the risk of X increased by 300%.
You are more swayed by an improvement in MPG from 30 to 40 than 10 to 11. (Not a mistake that I included this twice!).[9]
You think that planes should fly in straight lines on a map.
You add velocities of two objects flying away from each other.
On Linearity
Linearity is simple. Applying multiple linear transformations in succession results still in only a linear transformation. Repeatedly doing something so simple achieves no useful work. But since most functions can be approximated linearly, it is often useful to assume linearity locally and simply change the measure when the approximation breaks. It’s definitely simpler.
On the other hand, linearity is an extremely strong assumption. It implies that one thing is equivalent to another or to a slice of the other, with little manipulation. Thus, it’s much more likely for two things to be non-linearly related than linearly.
In some sense, every relationship can be linearized if one is able to transform both sides of the equation. You look at the reciprocal speed, or you look at the log-plot, or you use Elo, or you throw a huge neural network with non-linearities at an array of pixels to get out a probability of cat vs dog. Sometimes it’s easy, and sometimes it’s extremely difficult. If you are able to state the relationship between two things so simply, it could be said that you truly understand them.
Linearity fallacies are common when one quantity in the relationship is obscured or hard to express. What do you mean by best X, how do you quantify happiness, why are you worried or impressed, what is the goal you actually care about? Though, if you could so easily express it, then isn’t it something you already truly understand?
I spent a while trying to figure out what these two variables that you mistakenly assume are linearly related. It’s weird because the two miles are two discrete sections, and any attempt to visualize miles on the x-axis vs speed on the y-axis didn’t work.
Linearity Fallacies
A linearity fallacy is committed when you incorrectly assume two variables are related by a linear relationship. For example, assuming that adding people to a project decreases the time to finish proportionally[1] or that happiness scales linearly with income. This kind of fallacy, where you mistakenly draw an x-axis and a y-axis and a straight line going from the bottom-left to the top-right is a linearity fallacy that happens all the time.
But there’s more. One day, you’re walking along, and you decide to take two measurements of some property at different instances and get two real numbers, x and y. Numbers are numbers, right? Wait, before you go on, you should know that real numbers are powerful objects that rely on many hidden assumptions such as… wait, did you just average the two numbers?
Average Speed
“If a car’s speed is 30 mph for 1 mile and 60 mph for the next, what was the average speed over the two miles?”
If you simply average the two numbers given, you arrive at an intuitive, but incorrect 45 mph. Instead, the first mile takes 2 minutes, and the second takes 1 minute, so 2 miles in 3 minutes is 40 mph. The average speed (defined as total distance over total time) is not actually a linear quantity with respect to each mile. It is a linear quantity with respect to each minute. The speed for each minute is 30 mph, 30 mph, 60 mph with a total average of 40 mph. For miles, you want the inverse speed: the average of 1⁄30 and 1⁄60 is, conveniently, 1⁄40.
For this problem, the speed metric is warped, whereas the inverse speed is straight and even. It’s like you looked at a mirror, plotted two points on the mirror, and took the midpoint… but the mirror was a funhouse mirror. On the speed number line, 30 to 45 to 60 looks evenly spaced, but 45 is actually closer to 60 than it is to 30. For inverse speed, 1 unit is 1 unit everywhere. On warped number lines, points are not evenly spaced, and on warped mirrors, lines don’t even have to be straight. Warped mirrors are everywhere, and they look like real ones.
How is this a linearity fallacy?
A linear map[2] satisfies two properties (addition and scalar multiplication), which can be combined into just f(ax+by)=af(x)+bf(y). If we label the two miles as M and N, the whole road as R, and the speed function as f, then the intuitive but incorrect computation is
f(R)=f(M/2+N/2)=(f(M)+f(N))/2Explicitly, the two-dimensional[3] input variable being road-type with M=(1,0), N=(0,1), R=(0.5,0.5), and the output variable being speed. A multi-dimensional linearity fallacy!
Linearity Assumptions
Linearity is a strong assumption. But within that, there are many weaker assumptions. Once you measure something and obtain numbers, you can compare, average, add, subtract, multiply, divide, and all sorts of other things. Here are a few common weaker linearity assumptions that are not as obvious.
Note: it is also common to commit a “correlation implies causation” fallacy whilst assuming linearity, but linearity fallacies cannot simply be reduced to mistaking correlation for causation. And as always, simply making an assumption does not necessarily make it wrong nor does it being wrong necessarily make all conclusions false.
Averaging
The averaging assumption is when you assume that you can average numbers. This is presumably, to then do something with those numbers, or else why average them? These all have an averaging assumption:
You would pay any finite amount of money to play a coin-flip game where the payout starts at $1 and doubles for each head in a row you flip.[4]
You are willing to take a 2.5x or nothing bet at any bet size, because it’s +EV.[5]
You run a round-robin tournament, and pick a winner based on average win rate.
You give students a test with 100 questions, and grade them based on accuracy.
You take different ML models, and score them using accuracy on n questions (yes, even drawn IID). Or you take the average score across n different tasks.[6]
You allow users to score X on a 10-point rating scale and display the average rating.
You conclude that the X is the best chess opening, because it has the highest win rate against most opponents.
You run an experiment on 100 athletes and show a large average improvement in some important metric.
You compromise in some disagreement by meeting in the middle.
Sometimes, linear averaging may be violated when your numbers are on the wrong scale. Why speed and not inverse-speed, or wealth and not log-wealth, or any other such transformed scale? It may also be violated when items being averaged are not alike. Does every question, every task, every opponent, or every win matter the same?
Monotonicity
You assume monotonicity when you assume that things are roughly ranked by the metric you choose. Some examples:
More people in a project (or, cooks in the kitchen[7]) makes the end product faster or better.
More hours worked in a week means you get more done.
Money buys happiness.
In many cases, the relationship is actually an inverse-U shape. In that case, there is some optimal amount of the input quantity, and it can be much lower than you think. Some relationships, like between money and happiness, are likely actually monotonic (roughly, on average, at the population level, and across scales you care about) despite not being linear in the metric you chose. Happiness correlates with income at the population level, just not linearly, but it does with log-income across large scales[8].
Summing
The summing assumption is when you assume you can add numbers from different sources and use these a point of comparison. For example:
You are impressed by a company’s collective 100 years of total experience.
You assume you are winning in a chess game because you have a material advantage on the board.
Summing differs from averaging when the number of items being summed is not constant.
Distance
The distance assumption is when you assume you can take and compare absolute distances. For example:
You think method X is more impressive it boosted accuracy by 10 percentage points.
You assume that the last 10% of the project will also take 10% of the time.
You are more swayed by an improvement in MPG from 30 to 40 than 10 to 11.
Ratio
The ratio assumption is when you assume you can take and compare relative distances. For example:
You are worried because the risk of X increased by 300%.
You are more swayed by an improvement in MPG from 30 to 40 than 10 to 11. (Not a mistake that I included this twice!).[9]
Miscellaneous
Some miscellaneous examples:
You assume (x+y)n=xn+yn.[10]
You think that planes should fly in straight lines on a map.
You add velocities of two objects flying away from each other.
On Linearity
Linearity is simple. Applying multiple linear transformations in succession results still in only a linear transformation. Repeatedly doing something so simple achieves no useful work. But since most functions can be approximated linearly, it is often useful to assume linearity locally and simply change the measure when the approximation breaks. It’s definitely simpler.
On the other hand, linearity is an extremely strong assumption. It implies that one thing is equivalent to another or to a slice of the other, with little manipulation. Thus, it’s much more likely for two things to be non-linearly related than linearly.
In some sense, every relationship can be linearized if one is able to transform both sides of the equation. You look at the reciprocal speed, or you look at the log-plot, or you use Elo, or you throw a huge neural network with non-linearities at an array of pixels to get out a probability of cat vs dog. Sometimes it’s easy, and sometimes it’s extremely difficult. If you are able to state the relationship between two things so simply, it could be said that you truly understand them.
Linearity fallacies are common when one quantity in the relationship is obscured or hard to express. What do you mean by best X, how do you quantify happiness, why are you worried or impressed, what is the goal you actually care about? Though, if you could so easily express it, then isn’t it something you already truly understand?
https://en.wikipedia.org/wiki/The_Mythical_Man-Month
https://en.wikipedia.org/wiki/Linear_map
I spent a while trying to figure out what these two variables that you mistakenly assume are linearly related. It’s weird because the two miles are two discrete sections, and any attempt to visualize miles on the x-axis vs speed on the y-axis didn’t work.
https://en.wikipedia.org/wiki/St._Petersburg_paradox
https://en.wikipedia.org/wiki/Kelly_criterion
e.g. https://super.gluebenchmark.com/ or https://github.com/google/BIG-bench
https://en.wiktionary.org/wiki/too_many_cooks_spoil_the_broth
https://www.pnas.org/doi/10.1073/pnas.2208661120
http://www.mpgillusion.com/p/what-is-mpg-illusion.html
https://en.wikipedia.org/wiki/Freshman%27s_dream