The explanation by owencb is what I was trying to address. To be explicit about when the offset is being added, I’m suggesting replacing your log1p(x) ≣ log(1 + x) transformation with log(c + x) for c=10 or c=100.
If the choice of log-dollars is just for presentation, it doesn’t matter too much. But in a lesswrong-ish context, log-dollars also have connotations of things like the Kelly criterion, where it is taken completely seriously that there’s more of a difference between $0 and $1 than between $1 and $3^^^3.
To be explicit about when the offset is being added, I’m suggesting replacing your log1p(x) ≣ log(1 + x) transformation with log(c + x) for c=10 or c=100.
Which will do what, exactly? What does this accomplish? If you think it does something, please explain more clearly, preferably with references explaining why +10 or +100 would make any difference, or even better, make use of the full data which I have provided you and the analysis code, which I also provided you, exactly so criticisms could go beyond vague speculation and produce something firmer.
(If I sound annoyed, it’s because I spend hours cleaning up my analyses to provide full source code, all the data, and make sure all results can be derived from the source code, to deal with this sort of one-liner objection. If I didn’t care, I would just post some coefficients and a graph, and save myself a hell of a lot of time.)
If we add 100 to everything, that transformation will be sized differently after we take the log. 0s go from -infinity to +2, a jump of infinity (...plus 2, to the degree that makes any sense); 100s go from 2 to 2.3, a jump of .3. If we added 1 instead, 0s would go from infinity to 0, and 100s would go from 2 to 2.004. If we added .01, 0s would go to −2, and 100s would go to 2.00004.
But what does that do to our trendline? Suppose that 40% of EAs gave 0, and 60% of non-EAs gave 0. Then I when I calculate the mean difference in log-scale, the extra 20% of non-EAs whose score I can pick with my scaling factor is a third of the differing sample. The gulf between the groups (i.e. the difference between the trendlines) will be smaller if I choose 100 than if I choose 0.01. (I can’t pick a factor that makes the groups switch which one donated more—that’s the order preservation property—but if I add a trillion to all of donations, the difference between the groups will become invisible because both groups will look like a flat line, and if I add a trillionth to all of the donations, it’ll look much more like a graph of percent donating.)
And so it seems to me that there are three potentially interesting comparisons: percent not donating by age for the two groups (it seems likely EA will have less non-donors than non-EA at each age / age group), per-person and per-donor amounts donated for each age group (not sure about per-donor because of the previous effect, but presumably per-person amounts are higher), and then the overall analysis you did where either an offset or a direct 0->something mapping is applied so that the two effects can be aggregated.
(I don’t have R on this computer, or I would just generate the graphs I would have liked for you to make. Thanks for putting in that effort!)
The explanation by owencb is what I was trying to address. To be explicit about when the offset is being added, I’m suggesting replacing your
log1p(x) ≣ log(1 + x)
transformation withlog(c + x)
forc
=10 orc
=100.If the choice of log-dollars is just for presentation, it doesn’t matter too much. But in a lesswrong-ish context, log-dollars also have connotations of things like the Kelly criterion, where it is taken completely seriously that there’s more of a difference between $0 and $1 than between $1 and $3^^^3.
Which will do what, exactly? What does this accomplish? If you think it does something, please explain more clearly, preferably with references explaining why +10 or +100 would make any difference, or even better, make use of the full data which I have provided you and the analysis code, which I also provided you, exactly so criticisms could go beyond vague speculation and produce something firmer.
(If I sound annoyed, it’s because I spend hours cleaning up my analyses to provide full source code, all the data, and make sure all results can be derived from the source code, to deal with this sort of one-liner objection. If I didn’t care, I would just post some coefficients and a graph, and save myself a hell of a lot of time.)
Here’s why it matters:
If we add 100 to everything, that transformation will be sized differently after we take the log. 0s go from -infinity to +2, a jump of infinity (...plus 2, to the degree that makes any sense); 100s go from 2 to 2.3, a jump of .3. If we added 1 instead, 0s would go from infinity to 0, and 100s would go from 2 to 2.004. If we added .01, 0s would go to −2, and 100s would go to 2.00004.
But what does that do to our trendline? Suppose that 40% of EAs gave 0, and 60% of non-EAs gave 0. Then I when I calculate the mean difference in log-scale, the extra 20% of non-EAs whose score I can pick with my scaling factor is a third of the differing sample. The gulf between the groups (i.e. the difference between the trendlines) will be smaller if I choose 100 than if I choose 0.01. (I can’t pick a factor that makes the groups switch which one donated more—that’s the order preservation property—but if I add a trillion to all of donations, the difference between the groups will become invisible because both groups will look like a flat line, and if I add a trillionth to all of the donations, it’ll look much more like a graph of percent donating.)
And so it seems to me that there are three potentially interesting comparisons: percent not donating by age for the two groups (it seems likely EA will have less non-donors than non-EA at each age / age group), per-person and per-donor amounts donated for each age group (not sure about per-donor because of the previous effect, but presumably per-person amounts are higher), and then the overall analysis you did where either an offset or a direct 0->something mapping is applied so that the two effects can be aggregated.
(I don’t have R on this computer, or I would just generate the graphs I would have liked for you to make. Thanks for putting in that effort!)