Eh. feels wrong to me. Specifically, this argument feels over-complicated.
As best I can tell, the predominant mode of science in replication-crisis affected fields is that they do causal inference by sampling from noisy posteriors.
The predominant mode of science in non-replication-crisis affected fields is that they don’t do this or do this less.
Most of the time it seems like science is conducted like that in those fields because they have to. Can you come up with a better way of doing Psychology research? Science in hard fields is hard is definitely a less sexy hypothesis, but it seems obviously true?
They’re measuring a noisy phenomenon, yes, but that’s only half the problem. The other half of the problem is that society demands answers. New psychology results are a matter of considerable public interest and you can become rich and famous from them. In the gap between the difficulty of supply and the massive demand grows a culture of fakery. The same is true of nutrition— everyone wants to know what the healthy thing to eat is, and the fact that our current methods are incapable of discerning this is no obstacle to people who claim to know.
For a counterexample, look at the field of planetary science. Scanty evidence dribbles in from occasional spacecraft missions and telescopic observations, but the field is intellectually sound because public attention doesn’t rest on the outcome.
Can you come up with a better way of doing Psychology research?
Yes. More emphasis on concrete useful results, less emphasis on trying to find simple correlations in complex situations.
For example, “Do power poses work?”. They did studies like this one where they tell people to hold a pose for five minutes while preparing for a fake job interview, and then found that the pretend employers pretended to hire them more often in the “power pose” condition. Even assuming there’s a real effect where those students from that university actually impress those judges more when they pose powerfully ahead of time… does that really imply that power posing will help other people get real jobs and keep them past the first day?
That’s like studying “Are car brakes really necessary?” by setting up a short track and seeing if the people who run the “red light” progress towards their destination quicker. Contrast that with studying the cars and driving behaviors that win races, coming up with your theories, and testing them by trying to actually win races. You’ll find out very quickly if your “brakes aren’t needed” hypothesis is a scientific breakthrough or foolishly naive.
Instead of studying “Does CBT work?”, study the results of individual therapists, see if you can figure out what the more successful ones are doing differently than the less successful ones, and see if you can use what you learn to increase the effectiveness of your own therapy or the therapy of your students. If the answer turns out to be “The successful therapists all power pose pre-session, then perform textbook CBT” and that allows you to make better therapists, great. If it’s something else, then you get to focus on the things that actually show up in the data.
The results should speak for themselves. If they don’t, and you aren’t keeping in very close contact with real world results, then it’s super easy to go astray with internal feedback loops because the loop that matters isn’t closed.
The reason I trust research in physics in general is that it doesn’t end with publishing a paper. It often ends with building machines that depend on that research being right.
We don’t just “trust the science” that light is a wave; we use microwave ovens at home. We don’t just “trust the science” that relativity is right; we use the relativistic equations to adjust GPS measurements. Therefore it would be quite surprising to find out that any of these underlying theories is wrong. (I mean, it could be wrong, but it would have to be wrong in the right way that still keeps the GPS and the microwave ovens working. That limits the possibilities of what the alternative theory could be.)
Therefore, in a world where we all do power poses all the time, and if you forget to do them, you will predictably fail the exam...
...well, actually that could just be a placebo effect. (Something like seeing a black cat on your way to exam, freaking out about it, and failing to pay full attention to the exam.) Damn!
The reason I trust research in physics in general is that it doesn’t end with publishing a paper. It often ends with building machines that depend on that research being right.
We don’t just “trust the science” that light is a wave; we use microwave ovens at home.
Well said. I’m gonna have to steal that.
Therefore, in a world where we all do power poses all the time, and if you forget to do them, you will predictably fail the exam...
...well, actually that could just be a placebo effect.
Yeah, “Can I fail my exam” is a bad test, because when the test is “can I fail” then it’s easy for the theory to be “wrong in the right way”. GPS is a good test of GR because you just can’t do it without a better understanding of spacetime so it has to at least get something right even if it’s not the full picture. When you actually use the resulting technology in your day to day life and get results you couldn’t have gotten before, then it almost doesn’t matter what the scientific literature says, because “I would feel sorry for the good Lord. The theory is correct.”.
There are psychological equivalents of this, which rest on doing things that are simply beyond the abilities of people who lack this understanding. The “NLP fast phobia cure” is a perfect example of this, and I can provide citations if anyone is interested. I really get a kick out of the predictable arguments between those who “trust the science” but don’t understand it, and those who actually do it on a regular basis.
(Something like seeing a black cat on your way to exam, freaking out about it, and failing to pay full attention to the exam.) Damn!
This reminds me of an amusing anecdote.
I had a weird experience once where I got my ankle sprained pretty bad and found myself simultaneously indignantly deciding that my ankle wasn’t going to swell and also thinking I was crazy for feeling like swelling was a thing I could control—and it didn’t swell. I told my friend about this experience, and while she was skeptical and thought it sounded crazy, she tried it anyway and her next several injuries didn’t swell.
Eventually she casually mentioned to someone “Nah, my broken thumb isn’t going to swell because I decided not to”, and the person she was talking to responded as if she had said something else because his brain just couldn’t register what she actually said as a real possibility. She then got all self conscious about it and was kinda unintentionally gaslighted into feeling like she was crazy for thinking she could do that, and her thumb swelled up.
I had to call her and remind her “No, you don’t give up and expect it to swell because it ‘sounds crazy’, you intend for it to not swell anyway and find out whether it is something you can control or not”. The swelling went back down most of the way after that, though not to the same degree as in the previous cases where the injury never swelled in the first place.
The problem with this model is that the “bad” models/theories in replication-crisis-prone fields don’t look like random samples from a wide posterior. They have systematic, noticeable, and wrong (therefore not just coming from the data) patterns to them—especially patterns which make them more memetically fit, like e.g. fitting a popular political narrative. A model which just says that such fields are sampling from a noisy posterior fails to account for the predictable “direction” of the error which we see in practice.
I made an omission mistake in just saying “sampling from noisy posteriors,” note I didn’t say they were performing unbiased sampling.
To extend the Psychology example: a study could be considered a sampling technique of the noisy posterior. You appear to be arguing that the extent to which this is a biased sample is a “skill issue.”
I’m arguing that it is often very difficult to perform unbiased sampling in some fields; the issue might be a property of the posterior and not that the researcher has a weak prefrontal cortex. In this framing it would totally make sense if two researchers studying the same/correlated posterior(s) are biased in the same direction–its the same posterior!
Eh. feels wrong to me. Specifically, this argument feels over-complicated.
As best I can tell, the predominant mode of science in replication-crisis affected fields is that they do causal inference by sampling from noisy posteriors.
The predominant mode of science in non-replication-crisis affected fields is that they don’t do this or do this less.
Most of the time it seems like science is conducted like that in those fields because they have to. Can you come up with a better way of doing Psychology research? Science in hard fields is hard is definitely a less sexy hypothesis, but it seems obviously true?
They’re measuring a noisy phenomenon, yes, but that’s only half the problem. The other half of the problem is that society demands answers. New psychology results are a matter of considerable public interest and you can become rich and famous from them. In the gap between the difficulty of supply and the massive demand grows a culture of fakery. The same is true of nutrition— everyone wants to know what the healthy thing to eat is, and the fact that our current methods are incapable of discerning this is no obstacle to people who claim to know.
For a counterexample, look at the field of planetary science. Scanty evidence dribbles in from occasional spacecraft missions and telescopic observations, but the field is intellectually sound because public attention doesn’t rest on the outcome.
Yes. More emphasis on concrete useful results, less emphasis on trying to find simple correlations in complex situations.
For example, “Do power poses work?”. They did studies like this one where they tell people to hold a pose for five minutes while preparing for a fake job interview, and then found that the pretend employers pretended to hire them more often in the “power pose” condition. Even assuming there’s a real effect where those students from that university actually impress those judges more when they pose powerfully ahead of time… does that really imply that power posing will help other people get real jobs and keep them past the first day?
That’s like studying “Are car brakes really necessary?” by setting up a short track and seeing if the people who run the “red light” progress towards their destination quicker. Contrast that with studying the cars and driving behaviors that win races, coming up with your theories, and testing them by trying to actually win races. You’ll find out very quickly if your “brakes aren’t needed” hypothesis is a scientific breakthrough or foolishly naive.
Instead of studying “Does CBT work?”, study the results of individual therapists, see if you can figure out what the more successful ones are doing differently than the less successful ones, and see if you can use what you learn to increase the effectiveness of your own therapy or the therapy of your students. If the answer turns out to be “The successful therapists all power pose pre-session, then perform textbook CBT” and that allows you to make better therapists, great. If it’s something else, then you get to focus on the things that actually show up in the data.
The results should speak for themselves. If they don’t, and you aren’t keeping in very close contact with real world results, then it’s super easy to go astray with internal feedback loops because the loop that matters isn’t closed.
The reason I trust research in physics in general is that it doesn’t end with publishing a paper. It often ends with building machines that depend on that research being right.
We don’t just “trust the science” that light is a wave; we use microwave ovens at home. We don’t just “trust the science” that relativity is right; we use the relativistic equations to adjust GPS measurements. Therefore it would be quite surprising to find out that any of these underlying theories is wrong. (I mean, it could be wrong, but it would have to be wrong in the right way that still keeps the GPS and the microwave ovens working. That limits the possibilities of what the alternative theory could be.)
Therefore, in a world where we all do power poses all the time, and if you forget to do them, you will predictably fail the exam...
...well, actually that could just be a placebo effect. (Something like seeing a black cat on your way to exam, freaking out about it, and failing to pay full attention to the exam.) Damn!
Well said. I’m gonna have to steal that.
Yeah, “Can I fail my exam” is a bad test, because when the test is “can I fail” then it’s easy for the theory to be “wrong in the right way”. GPS is a good test of GR because you just can’t do it without a better understanding of spacetime so it has to at least get something right even if it’s not the full picture. When you actually use the resulting technology in your day to day life and get results you couldn’t have gotten before, then it almost doesn’t matter what the scientific literature says, because “I would feel sorry for the good Lord. The theory is correct.”.
There are psychological equivalents of this, which rest on doing things that are simply beyond the abilities of people who lack this understanding. The “NLP fast phobia cure” is a perfect example of this, and I can provide citations if anyone is interested. I really get a kick out of the predictable arguments between those who “trust the science” but don’t understand it, and those who actually do it on a regular basis.
This reminds me of an amusing anecdote.
I had a weird experience once where I got my ankle sprained pretty bad and found myself simultaneously indignantly deciding that my ankle wasn’t going to swell and also thinking I was crazy for feeling like swelling was a thing I could control—and it didn’t swell. I told my friend about this experience, and while she was skeptical and thought it sounded crazy, she tried it anyway and her next several injuries didn’t swell.
Eventually she casually mentioned to someone “Nah, my broken thumb isn’t going to swell because I decided not to”, and the person she was talking to responded as if she had said something else because his brain just couldn’t register what she actually said as a real possibility. She then got all self conscious about it and was kinda unintentionally gaslighted into feeling like she was crazy for thinking she could do that, and her thumb swelled up.
I had to call her and remind her “No, you don’t give up and expect it to swell because it ‘sounds crazy’, you intend for it to not swell anyway and find out whether it is something you can control or not”. The swelling went back down most of the way after that, though not to the same degree as in the previous cases where the injury never swelled in the first place.
The problem with this model is that the “bad” models/theories in replication-crisis-prone fields don’t look like random samples from a wide posterior. They have systematic, noticeable, and wrong (therefore not just coming from the data) patterns to them—especially patterns which make them more memetically fit, like e.g. fitting a popular political narrative. A model which just says that such fields are sampling from a noisy posterior fails to account for the predictable “direction” of the error which we see in practice.
I made an omission mistake in just saying “sampling from noisy posteriors,” note I didn’t say they were performing unbiased sampling.
To extend the Psychology example: a study could be considered a sampling technique of the noisy posterior. You appear to be arguing that the extent to which this is a biased sample is a “skill issue.”
I’m arguing that it is often very difficult to perform unbiased sampling in some fields; the issue might be a property of the posterior and not that the researcher has a weak prefrontal cortex. In this framing it would totally make sense if two researchers studying the same/correlated posterior(s) are biased in the same direction–its the same posterior!