The thin line (supposed to be purple, not gray) is an exponentially weighted moving average. It’s what’s recommended in The Hacker’s Diet [http://dreev.es/hackdiet] as a way to keep from freaking out about the day-to-day fluctuations in your weight. As long as all your datapoints are below that line then you’re inexorably trending downward.
The “rose-colored dots” are an attempt at a more normal-person-friendly version of that. It’s a transformation of your data to be as monotonic as possible such that the transformed datapoints (the rose-colored ones) are still within something like a standard deviation of the actual measured datapoint.
There’s also the blue-green aura around your datapoints which is a very thick polynomial regression on your data.
All of these things are an attempt to show you your true trend.
(Also, they only apply to goals like weight loss where the measurements are noisy.)
One thing I don’t like about John Walker’s algorithm is that it gives too much
weight to the very first data point, and to data before a ‘break’, so that if
you only report your weight n times every N days (as I do, because I don’t
have a scale in the flat where I’m staying so I only weigh myself when I go
back home on weekends—I know that makes the whole thing a lot harder), the
trend line will change about N/n times as slowly as it should. I prefer this
algorithm:
I just had another idea (loosely inspired on the Glicko rating system): suppose that a person on Day 0 has an unknown “true weight” W_0, but because of measurement errors and unknown amount of body water etc. the scale reads w_0, which is normally distributed with mean W_0 and variance σ^2; suppose also that if we knew W_0 we would assign W_1 (the “true weight” on Day 1) a probability distribution with mean W_0 and variance c^2(t_1 − t_0). Now, if our probability distribution for W_n is a Gaussian with mean u_n and variance σ_n^2, on knowing the measured weight w_n we would update it to mean (u_n/σ_n^2 + w_n/σ^2)/(1/σ_n^2 + 1/σ^2) and variance 1//(1/σ_n^2 + 1/σ^2). Hence:
(Multiplying sigma_sq and c_sq by the same constant doesn’t affect the values of smoothweight, as far as I can tell.) In the limit of data on a large number of consecutive days, sigman_sq approaches 0.1 and the algorithm becomes equivalent to the other ones. I’ve tried this with my own gapped data and the trend line changes faster than with the Hacker’s Diet algorithm but not as fast as with my old algorithm. But I now prefer this because it has a rationale resembling more a derivation from first principles than someone pulling stuff out of their ass.
(Is there a way of getting real subscripts and superscripts?)
Also, σ^2 and c^2 could in principle be found empirically: in this model, the difference of measured weights t days apart is normally distributed with variance (2σ^2 + tc^2). I found a file with a couple years’ worth of almost daily weight data of myself from a few years ago and computed the average of (w_n − w_(n − t))^2 for various values of t, and for not-too-large values that’s actually approximately linear in t (except it is slightly lower at multiples of 7 days, which I take to be an effect of week cycles—I tend to eat more on weekends). But the ratio between the c^2 and the σ^2 I found was nowhere near 1⁄90 -- it was actually about 1⁄5, which suggests that the Hacker’s Diet smoothed average responds to changes in weight much more slowly than it should, even if the weight is reported daily.
(Will anyone bother to find out the formula for the ideal Bayesian estimate of c^2 and σ^2 in this model, assuming uninformative priors?)
The thin line (supposed to be purple, not gray) is an exponentially weighted moving average. It’s what’s recommended in The Hacker’s Diet [http://dreev.es/hackdiet] as a way to keep from freaking out about the day-to-day fluctuations in your weight. As long as all your datapoints are below that line then you’re inexorably trending downward.
The “rose-colored dots” are an attempt at a more normal-person-friendly version of that. It’s a transformation of your data to be as monotonic as possible such that the transformed datapoints (the rose-colored ones) are still within something like a standard deviation of the actual measured datapoint.
There’s also the blue-green aura around your datapoints which is a very thick polynomial regression on your data.
All of these things are an attempt to show you your true trend.
(Also, they only apply to goals like weight loss where the measurements are noisy.)
Thank you very much...
One thing I don’t like about John Walker’s algorithm is that it gives too much weight to the very first data point, and to data before a ‘break’, so that if you only report your weight n times every N days (as I do, because I don’t have a scale in the flat where I’m staying so I only weigh myself when I go back home on weekends—I know that makes the whole thing a lot harder), the trend line will change about N/n times as slowly as it should. I prefer this algorithm:
(which is equivalent to Walker’s algorithm if you report your weight every day and have done so for a while).
Nice, thanks! Is that by chance equivalent to what this page is suggesting: http://stackoverflow.com/questions/1023860
It is equivalent to the answer by yairchu of Jun 21 ’09 at 15:53, as far as I can tell.
I just had another idea (loosely inspired on the Glicko rating system): suppose that a person on Day 0 has an unknown “true weight” W_0, but because of measurement errors and unknown amount of body water etc. the scale reads w_0, which is normally distributed with mean W_0 and variance σ^2; suppose also that if we knew W_0 we would assign W_1 (the “true weight” on Day 1) a probability distribution with mean W_0 and variance c^2(t_1 − t_0). Now, if our probability distribution for W_n is a Gaussian with mean u_n and variance σ_n^2, on knowing the measured weight w_n we would update it to mean (u_n/σ_n^2 + w_n/σ^2)/(1/σ_n^2 + 1/σ^2) and variance 1//(1/σ_n^2 + 1/σ^2). Hence:
(Multiplying
sigma_sq
andc_sq
by the same constant doesn’t affect the values ofsmoothweight
, as far as I can tell.) In the limit of data on a large number of consecutive days,sigman_sq
approaches 0.1 and the algorithm becomes equivalent to the other ones. I’ve tried this with my own gapped data and the trend line changes faster than with the Hacker’s Diet algorithm but not as fast as with my old algorithm. But I now prefer this because it has a rationale resembling more a derivation from first principles than someone pulling stuff out of their ass.(Is there a way of getting real subscripts and superscripts?)
Also, σ^2 and c^2 could in principle be found empirically: in this model, the difference of measured weights t days apart is normally distributed with variance (2σ^2 + tc^2). I found a file with a couple years’ worth of almost daily weight data of myself from a few years ago and computed the average of (w_n − w_(n − t))^2 for various values of t, and for not-too-large values that’s actually approximately linear in t (except it is slightly lower at multiples of 7 days, which I take to be an effect of week cycles—I tend to eat more on weekends). But the ratio between the c^2 and the σ^2 I found was nowhere near 1⁄90 -- it was actually about 1⁄5, which suggests that the Hacker’s Diet smoothed average responds to changes in weight much more slowly than it should, even if the weight is reported daily.
(Will anyone bother to find out the formula for the ideal Bayesian estimate of c^2 and σ^2 in this model, assuming uninformative priors?)