I once saw an advert claiming that a pregnancy test was “over 99% accurate”. This inspired me to invent an only-slightly-worse pregnancy test, which is over 98% accurate. My invention is a rock with “NOT PREGNANT” scrawled on it: when applied to a randomly selected human being, it is right more than 98% of the time. It is also cheap, non-invasive, endlessly reusable, perfectly consistent, immediately effective and impossible to apply incorrectly; this massive improvement in cost and convenience is obviously worth the ~1% decrease in accuracy.
I think they meant over 99% when used on a non-randomly selected human who’s bothering to take a pregnancy test. Your rock would run maybe 70% or so on that application.
This is a general problem with the measure of accuracy. In binary classification, with two events A and B, “accuracy” is broadly defined as the probability of the “if and only if” biconditional, P(A↔B). Which is equivalent to P((A∧B)∨(¬A∧¬B)). It’s the probability of both events having the same truth value, of either both being true or both being false.
In terms of diagnostic testing it is the probability of the test being positive if and only if the tested condition (e.g. pregnancy) is present.
The problem with this is that the number is strongly dependent on the base rates. If pregnancy is rare, say it has a base rate of 2%, the accuracy of the rock test (which always says “not pregnant”, i.e. is always negative) is
Two better measures are Pearson/Phi correlation (which ranges from −1 to +1), and the odds ratio, which ranges from 0 to +∞, but which can also be scaled to the range [-1, +1] and is then called Yule’s Y.
Both correlation and Yule’s Y are 0 when the two events are statistically independent, but they differ for when they assume their maximum and minimum values. Correlation is 1 if both events always co-occur (imply each other), and −1 if they never co-occur (each event implies the negation of the other). Yule’s Y is 1 if at least one event implies the other, or the negation of at least one implies the negation of the other. It is −1 if at least one event implies the negation of the other, or negation of at least one event implies the other.
This also means that correlation is still dependent on the base rates (e.g. marginal probability of the test being positive, or of someone being pregnant) because the measure can only be maximal if both events have equal base rates (marginal probability), or minimal if the base rate of one event is equal to the base rate of the negation of the other. This is not the case for odds ratio / Yule’s Y. It is purely a measure of statistical dependence. Another interesting fact: The correlation of two events is exactly equal to Yule’s Y if both events have base rates of 50%.
I once saw an advert claiming that a pregnancy test was “over 99% accurate”. This inspired me to invent an only-slightly-worse pregnancy test, which is over 98% accurate. My invention is a rock with “NOT PREGNANT” scrawled on it: when applied to a randomly selected human being, it is right more than 98% of the time. It is also cheap, non-invasive, endlessly reusable, perfectly consistent, immediately effective and impossible to apply incorrectly; this massive improvement in cost and convenience is obviously worth the ~1% decrease in accuracy.
Also see Scott Alexander’s Heuristics That Almost Always Work.
I think they meant over 99% when used on a non-randomly selected human who’s bothering to take a pregnancy test. Your rock would run maybe 70% or so on that application.
This is a general problem with the measure of accuracy. In binary classification, with two events A and B, “accuracy” is broadly defined as the probability of the “if and only if” biconditional, P(A↔B). Which is equivalent to P((A∧B)∨(¬A∧¬B)). It’s the probability of both events having the same truth value, of either both being true or both being false.
In terms of diagnostic testing it is the probability of the test being positive if and only if the tested condition (e.g. pregnancy) is present.
The problem with this is that the number is strongly dependent on the base rates. If pregnancy is rare, say it has a base rate of 2%, the accuracy of the rock test (which always says “not pregnant”, i.e. is always negative) is
P(test positive∧pregnant)+P(¬test positive∧¬pregnant)=0%+98%=98%.
Two better measures are Pearson/Phi correlation (which ranges from −1 to +1), and the odds ratio, which ranges from 0 to +∞, but which can also be scaled to the range [-1, +1] and is then called Yule’s Y.
Both correlation and Yule’s Y are 0 when the two events are statistically independent, but they differ for when they assume their maximum and minimum values. Correlation is 1 if both events always co-occur (imply each other), and −1 if they never co-occur (each event implies the negation of the other). Yule’s Y is 1 if at least one event implies the other, or the negation of at least one implies the negation of the other. It is −1 if at least one event implies the negation of the other, or negation of at least one event implies the other.
This also means that correlation is still dependent on the base rates (e.g. marginal probability of the test being positive, or of someone being pregnant) because the measure can only be maximal if both events have equal base rates (marginal probability), or minimal if the base rate of one event is equal to the base rate of the negation of the other. This is not the case for odds ratio / Yule’s Y. It is purely a measure of statistical dependence. Another interesting fact: The correlation of two events is exactly equal to Yule’s Y if both events have base rates of 50%.