it was widely verified if you took a number of single-digit numbers and calculated their standard deviation, and then took the same numbers and added a million to them, the standard deviation was often different, when it should be exactly the same.
Virtue points for checking! For the sake of thoroughness: how many values did you use? A comment from my FB wall suggests this comes from a hack for handling very large data sets.
The numerically-unstable stdev is a classic. I’ve run into it in ML stuff. It happens when you try to be clever and compute the stdev in one pass through the data without computing the mean beforehand… Unless you use the really cool black-magic trick that lets you do this in a more stable way.
In an ML context this happens if you want the running stdev of a stream of data you can’t afford to store in memory (so you can’t find the mean and then go back and start from the beginning). Excel has no such excuse.
As a software developer, I completely expect this; it’s just an everyday consequence of limited-precision math. You’ll get the same kind of anomalies in almost any programming language, and other kinds of calculations can produce far worse inaccuracies than this (seems kind of odd to single out standard deviations which are relatively well-behaved, but maybe Excel used to have a worse algorithm). John Gustafson proposed using clusters of unums and then posits to mitigate the problem (preferably in hardware), but it’s not practical to eliminate it entirely.
This bug does not exist in Excel for Mac 2011.
Virtue points for checking! For the sake of thoroughness: how many values did you use? A comment from my FB wall suggests this comes from a hack for handling very large data sets.
Update: I just tried it with 100,000 values. Still no bug.
Update 2: I’ve now tried it with 1,000,000 values. Everything is still correct.
I tried a set of 104,857,600 values in Excel 2010 for Windows and the standard deviations diverge at the 9th decimal:
Count 1: 104,857,600
Count 2: 104,857,600
Sum 1: 5,242,888,545
Sum 2: 104,862,842,888,545
Average 1: 50.00008149
Average 2: 1,000,050.00008149
Stdev.p 1: 29.15276094096330000
Stdev.p 2: 29.15276094160490000
As a software developer, I completely expect this; it’s just an everyday consequence of limited-precision math. You’ll get the same kind of anomalies in almost any programming language, and other kinds of calculations can produce far worse inaccuracies than this (seems kind of odd to single out standard deviations which are relatively well-behaved, but maybe Excel used to have a worse algorithm). John Gustafson proposed using clusters of unums and then posits to mitigate the problem (preferably in hardware), but it’s not practical to eliminate it entirely.
I used 100 values. How many is “very large”? 1,000? 1,000,000? More?