Summary: Rigorous scientific experiments are hard to apply in daily life but we still want to try out and evaluate things like self-improvement methods. In doing so we can look for things such as a) effect sizes that are so large that they don’t seem likely to be attributable to bias, b) a deep understanding of the mechanism of a technique, c) simple non-rigorous tests.
Hello there! This is my first attempt at a top-level post and I’ll start it off with a little story.
Five years ago, in a kitchen in London...
My wife: We’re going to have my friends over for dinner and we’re making that pasta sauce everyone likes. I’m going to need you to cut some red peppers.
Me: Can do! *chop chop chop*
My wife: Hey, Mr. Engineer, you’ve got seeds all over! What are you doing to that pepper?
Me: Well, admittedly this time I was a bit clumsy and there’s more seed spillage than usual—but it’s precisely to avoid spilling seeds that I start by surgically removing the core and then...
My wife: Stop, just stop. That’s got to be the worst possible way to do this. See, this is how you cut a pepper, *chop chop chop*. Nice slices, no mess.
Me: *is humiliated* *learns*
Now, ever since then I’ve cut peppers using the method my wife showed me. It’s a much better way to do it. But wait! How do I know that? Don’t I realize that humans are subject to massive cognitive biases? Maybe I just want to please my wife by doing things her way so I’ve convinced myself her method is better. Maybe I’m remembering the hits and forgetting the misses—maybe I’ve forgotten all the times my method worked out great and the times her method failed. Maybe I am indeed making less of a mess than I used to but it’s simply due to my knife skills improving—and that would have happened even if I’d stuck with the old method. And there are probably a dozen more biases and confounding factors that could come into play but I haven’t even thought of.
Don’t I need to do a test? How about cutting up lots of peppers using the two alternative methods and measuring seed spillage? But, no, that’s not good enough—I could subconsciously affect the result by applying less skill when using one method. I’d need a neutral party to operate the methods, preferably a number of people. And I’d need a neutral observer too. The person who measures the seed spillage from each operation should not know which method was used. Yeah, a double blind test, that’s the ticket. That’s what I should do, right?
No, obviously that’s not what I should do. There are two reasons:
A) The resources needed to conduct the suggested test are completely disproportional to any benefit such a test might yield.
B) I already bloody well know that my wife’s method is better.
The first reason is obvious enough but the second reason needs a bit more exploration. Why do I know this? I think there are two reasons.
* The effect size is large and sustained. Previously, I used to make a mess just about every time. After I switched methods I get a clean cut just about every time.
* I understand the mechanism explaining the effect very well. I can see what’s wrong with the method I was using previously (if I try to pull the core through a hole that’s too small for its widest part then some seeds will rub off) and I can see how my wife’s method doesn’t have that problem (no pulling the core through a hole, just cut around it).
I’d like to try to generalize from this example. Many people on this site are interested in methods for self-improvement, e.g. methods for fighting akrasia or developing social skills. Very often, those methods have not been tested scientifically and we do not ourselves have the resources to conduct such tests. Even in cases where there have been scientific experiments we cannot be confident in applying the results to ourselves. Even if a psychology experiment shows that a certain way of doing things has a statistically significant1 effect on some group that is no guarantee that it will have an effect on a particular individual. So, it is no surprise that discussion of self-improvement methods is frequently met with skepticism around here. And that’s largely healthy.
But how can we tell whether a self-improvement method is worth trying out? And if we do try it, how can we tell if it’s working for us? One thing we can do, like in the pepper example, is to look for large effects and plausible mechanisms. Biases and other confounding factors make it hard for us to tell the difference between a small negative effect, no effect and a small positive effect. But we still have a decent chance of correctly telling the difference between no effect and a large effect.
Another thing we can do is to use some science. Just because a rigorous double blind test with a hundred participants isn’t practical doesn’t mean we can’t do any tests at all. A person trying out a new diet will weigh themselves every day. And if you’re testing out a self-improvement technique then you can try to find some metric that will give you an idea of you how well you are doing. Trying out a method for getting more work done on your dissertation? Maybe you should measure your daily word count, it’s not perfect but it’s something. As xkcd’s Zombie Feynman would have it, “Ideas are tested by experiment, that is the core of science.”
Erring on the side of too much credulity is bad and erring on the side of too much skepticism is also bad. Both prevent us from becoming stronger.
1) As good Bayesians we, of course, find psychologists’ obsession with null hypotheses and statistical significance to be misguided and counterproductive. But that’s a story for another time.
The Science of Cutting Peppers
Summary: Rigorous scientific experiments are hard to apply in daily life but we still want to try out and evaluate things like self-improvement methods. In doing so we can look for things such as a) effect sizes that are so large that they don’t seem likely to be attributable to bias, b) a deep understanding of the mechanism of a technique, c) simple non-rigorous tests.
Hello there! This is my first attempt at a top-level post and I’ll start it off with a little story.
Five years ago, in a kitchen in London...
My wife: We’re going to have my friends over for dinner and we’re making that pasta sauce everyone likes. I’m going to need you to cut some red peppers.
Me: Can do! *chop chop chop*
My wife: Hey, Mr. Engineer, you’ve got seeds all over! What are you doing to that pepper?
Me: Well, admittedly this time I was a bit clumsy and there’s more seed spillage than usual—but it’s precisely to avoid spilling seeds that I start by surgically removing the core and then...
My wife: Stop, just stop. That’s got to be the worst possible way to do this. See, this is how you cut a pepper, *chop chop chop*. Nice slices, no mess.
Me: *is humiliated* *learns*
Now, ever since then I’ve cut peppers using the method my wife showed me. It’s a much better way to do it. But wait! How do I know that? Don’t I realize that humans are subject to massive cognitive biases? Maybe I just want to please my wife by doing things her way so I’ve convinced myself her method is better. Maybe I’m remembering the hits and forgetting the misses—maybe I’ve forgotten all the times my method worked out great and the times her method failed. Maybe I am indeed making less of a mess than I used to but it’s simply due to my knife skills improving—and that would have happened even if I’d stuck with the old method. And there are probably a dozen more biases and confounding factors that could come into play but I haven’t even thought of.
Don’t I need to do a test? How about cutting up lots of peppers using the two alternative methods and measuring seed spillage? But, no, that’s not good enough—I could subconsciously affect the result by applying less skill when using one method. I’d need a neutral party to operate the methods, preferably a number of people. And I’d need a neutral observer too. The person who measures the seed spillage from each operation should not know which method was used. Yeah, a double blind test, that’s the ticket. That’s what I should do, right?
No, obviously that’s not what I should do. There are two reasons:
A) The resources needed to conduct the suggested test are completely disproportional to any benefit such a test might yield.
B) I already bloody well know that my wife’s method is better.
The first reason is obvious enough but the second reason needs a bit more exploration. Why do I know this? I think there are two reasons.
* The effect size is large and sustained. Previously, I used to make a mess just about every time. After I switched methods I get a clean cut just about every time.
* I understand the mechanism explaining the effect very well. I can see what’s wrong with the method I was using previously (if I try to pull the core through a hole that’s too small for its widest part then some seeds will rub off) and I can see how my wife’s method doesn’t have that problem (no pulling the core through a hole, just cut around it).
I’d like to try to generalize from this example. Many people on this site are interested in methods for self-improvement, e.g. methods for fighting akrasia or developing social skills. Very often, those methods have not been tested scientifically and we do not ourselves have the resources to conduct such tests. Even in cases where there have been scientific experiments we cannot be confident in applying the results to ourselves. Even if a psychology experiment shows that a certain way of doing things has a statistically significant1 effect on some group that is no guarantee that it will have an effect on a particular individual. So, it is no surprise that discussion of self-improvement methods is frequently met with skepticism around here. And that’s largely healthy.
But how can we tell whether a self-improvement method is worth trying out? And if we do try it, how can we tell if it’s working for us? One thing we can do, like in the pepper example, is to look for large effects and plausible mechanisms. Biases and other confounding factors make it hard for us to tell the difference between a small negative effect, no effect and a small positive effect. But we still have a decent chance of correctly telling the difference between no effect and a large effect.
Another thing we can do is to use some science. Just because a rigorous double blind test with a hundred participants isn’t practical doesn’t mean we can’t do any tests at all. A person trying out a new diet will weigh themselves every day. And if you’re testing out a self-improvement technique then you can try to find some metric that will give you an idea of you how well you are doing. Trying out a method for getting more work done on your dissertation? Maybe you should measure your daily word count, it’s not perfect but it’s something. As xkcd’s Zombie Feynman would have it, “Ideas are tested by experiment, that is the core of science.”
Erring on the side of too much credulity is bad and erring on the side of too much skepticism is also bad. Both prevent us from becoming stronger.
1) As good Bayesians we, of course, find psychologists’ obsession with null hypotheses and statistical significance to be misguided and counterproductive. But that’s a story for another time.