I’m too lazy to do a better analysis now, but just to provide the barest of intuitions:
Let’s say a study with trillions of participants has shown that using Strategy A works better than not using Strategy A 80% of the time. I’m about to decide whether or not to use Strategy A, and unfortunately I don’t know about the study. I poll three of my friends who have all done rigorous self-experiments. (Or maybe I’ve done three rigorous self-experiments myself.) All it takes is a pocket calculator to show that I have a 90% chance of correctly guessing whether I should use Strategy A: .2^3 + 3 (.8.2*.2) = .104. And obviously if I poll myself, based on a single past rigorous self-experiment, I’ll have an 80% chance of getting it right.
(A better analysis would probably use the normal approximation for the binomial distribution, so we could see results for all sorts of parameters, but that would be a pain to write out with my voice recognition system.)
I suspect that scientific evidence is most useful on questions that are hard to decide (e.g. if Strategy A works 51% of the time; incidentally this sort of knowledge is also the most useless), or in cases where your degree of belief matters beyond just choosing whether or not to use a strategy (seems kind of rare).
This last point about degree of belief not mattering much could explain why Bayesian statistics didn’t catch on as well as frequentist statistics initially: most of the time, your exact degree of belief doesn’t matter and you just need to decide whether or not to do something.
You’re making a massive assumption: that self-experimentation is not biased worse than regular clinical trials by things like selection effects. This is what I mean by methodological concerns making each self-experiment far far less than n=1. I mean, look at OP—from the sound of it, the friend did not report their results anywhere (perhaps because they were null?). Bingo, publication effect. People don’t want to discuss null effects, they want to discuss positive results. I’ve seen this first-hand with dual n-back, among others, where I had trouble eliciting the null results even though they existed.
Given this sort of bias and zero effort on self-experimenters’ part to counter it, yes, you absolutely could do far worse than random by sampling 1000 self-experimenters compared to 1000 clinical trial participants! This is especially true for highly variable stuff like sleep, where you can spot any trend you like in all the noise—compare the dramatic confident anecdotes collected by Seth Roberts about vitamin D at night based on purely subjective retrospective recall of <10 nights to my actual relatively moderate findings based on 40 nights of Zeo data.
(I actually have a little demonstration that someone is engaging in considerable confirmation bias, but I’m not done yet. I should be able to post the result in early May.)
Something about your rough model disagrees with me (in addition to the stuff in gwern’s comment). Tentatively I’d put my finger on strategies like your hypothetical strategy A being rarer than they look. I think it’s uncommon for a prospective lifestyle change to simultaneously
have a much better chance than 50% of being worth implementing...
...yet not obviously be a good idea a priori
be something you’re not already doing
be easy for you and/or friends to test/implement
be non-obvious enough that published research on it doesn’t already exist
A half stick of butter every day makes you smarter—and in contrast to an equivalent amount of other saturated fats? That’s really rather surprising. I would like to see more research on that. Because it is kind of awesome.
Well obviously you have to decide on a case-by-case basis whether Real Science is necessary,
To be sure. I don’t think my line of argument should shut the door on self-experimentation. I’d just focus on low-risk, low-effort interventions as candidates. (Otherwise I’m likely to end up with more high-risk/high-effort false positives than I’d like.)
but the butter mind thing is looking pretty good
So it is! When I saw the original Seth Roberts blog post my reaction was to write it off as a probable fluke. The fact that it seems to replicate in a randomized trial with n = 45 makes me much more interested, especially as the relative speed-up from the butter remained at about 5% (suggesting Seth’s original result wasn’t just a high/low outlier). I’d have chosen a different experimental design, and I’ll have to take a look at the raw data to convince myself of the analysis, but it seems promising.
As for the Anki thing, I probably wouldn’t wait! It’s the sort of low-effort, low-risk intervention that’s best for self-experimentation.
I’m too lazy to do a better analysis now, but just to provide the barest of intuitions:
Let’s say a study with trillions of participants has shown that using Strategy A works better than not using Strategy A 80% of the time. I’m about to decide whether or not to use Strategy A, and unfortunately I don’t know about the study. I poll three of my friends who have all done rigorous self-experiments. (Or maybe I’ve done three rigorous self-experiments myself.) All it takes is a pocket calculator to show that I have a 90% chance of correctly guessing whether I should use Strategy A: .2^3 + 3 (.8.2*.2) = .104. And obviously if I poll myself, based on a single past rigorous self-experiment, I’ll have an 80% chance of getting it right.
(A better analysis would probably use the normal approximation for the binomial distribution, so we could see results for all sorts of parameters, but that would be a pain to write out with my voice recognition system.)
I suspect that scientific evidence is most useful on questions that are hard to decide (e.g. if Strategy A works 51% of the time; incidentally this sort of knowledge is also the most useless), or in cases where your degree of belief matters beyond just choosing whether or not to use a strategy (seems kind of rare).
This last point about degree of belief not mattering much could explain why Bayesian statistics didn’t catch on as well as frequentist statistics initially: most of the time, your exact degree of belief doesn’t matter and you just need to decide whether or not to do something.
You’re making a massive assumption: that self-experimentation is not biased worse than regular clinical trials by things like selection effects. This is what I mean by methodological concerns making each self-experiment far far less than n=1. I mean, look at OP—from the sound of it, the friend did not report their results anywhere (perhaps because they were null?). Bingo, publication effect. People don’t want to discuss null effects, they want to discuss positive results. I’ve seen this first-hand with dual n-back, among others, where I had trouble eliciting the null results even though they existed.
Given this sort of bias and zero effort on self-experimenters’ part to counter it, yes, you absolutely could do far worse than random by sampling 1000 self-experimenters compared to 1000 clinical trial participants! This is especially true for highly variable stuff like sleep, where you can spot any trend you like in all the noise—compare the dramatic confident anecdotes collected by Seth Roberts about vitamin D at night based on purely subjective retrospective recall of <10 nights to my actual relatively moderate findings based on 40 nights of Zeo data.
(I actually have a little demonstration that someone is engaging in considerable confirmation bias, but I’m not done yet. I should be able to post the result in early May.)
I don’t necessarily disagree with you on any of this. Looks to me like we are talking past each other a little bit.
Something about your rough model disagrees with me (in addition to the stuff in gwern’s comment). Tentatively I’d put my finger on strategies like your hypothetical strategy A being rarer than they look. I think it’s uncommon for a prospective lifestyle change to simultaneously
have a much better chance than 50% of being worth implementing...
...yet not obviously be a good idea a priori
be something you’re not already doing
be easy for you and/or friends to test/implement
be non-obvious enough that published research on it doesn’t already exist
(Edited to add “be” to bullet point 2.)
Well obviously you have to decide on a case-by-case basis whether Real Science is necessary, but the butter mind thing is looking pretty good:
http://quantifiedself.com/2011/01/results-of-the-buttermind-experiment/
Would you wait for a real study before trying this?
http://lesswrong.com/lw/ba6/alternate_card_types_for_anki/
W. T. F! ?
A half stick of butter every day makes you smarter—and in contrast to an equivalent amount of other saturated fats? That’s really rather surprising. I would like to see more research on that. Because it is kind of awesome.
To be sure. I don’t think my line of argument should shut the door on self-experimentation. I’d just focus on low-risk, low-effort interventions as candidates. (Otherwise I’m likely to end up with more high-risk/high-effort false positives than I’d like.)
So it is! When I saw the original Seth Roberts blog post my reaction was to write it off as a probable fluke. The fact that it seems to replicate in a randomized trial with n = 45 makes me much more interested, especially as the relative speed-up from the butter remained at about 5% (suggesting Seth’s original result wasn’t just a high/low outlier). I’d have chosen a different experimental design, and I’ll have to take a look at the raw data to convince myself of the analysis, but it seems promising.
As for the Anki thing, I probably wouldn’t wait! It’s the sort of low-effort, low-risk intervention that’s best for self-experimentation.