I assign very low probability to psi for several reasons. The first is evolution. If psi is an evolved feature of human brains, then it should be universal in humans and provide fitness advantages, but I’ve seen no evidence of these fitness advantages.
The second is that it is convenient. Humans like fantasizing about having superpowers. That is by far a simpler explanation for widespread belief and academic study.
The third is the economic argument, as explained by Randall Munroe here.
The fourth you’ve already mentioned; psi is an almost textbook example of the mind-projection fallacy.
None of these are knock-down arguments, but together they convince me that psi is not worth looking into unless something changes (a big result published in a mainstream journal that replicates, e.g.).
I couldn’t immediately find it when I searched a few minutes ago. I remember reading that it had failed to replicate like twice out of like seven and succeeded the other five times. (I remember having read your article and thus was surprised to see it had in fact replicated. This was before I took psi at all seriously.) I presume that someone wouldn’t straight-up lie about how many times it’d been replicated. I think I was reading a reputable source but am not super-confident of that. Help, anyone? If not I’ll try searching again soon.
I updated my post on this with numbers and links for several more of the attempted replications.
Looking back at this exchange, I want to note that you were able to invert the public empirical data on this question, along with the several other errors elsewhere in this thread, despite (or because of) the overconfidence of your initial claims. Similar things have happened when you have stuck your neck out on decision theory for attentive experts to chop off. You should generalize and a) put more effort into disconfirming your ideas; b) reduce your confidence in seemingly crazy contrarian views backed by vague impressions (lacking good metadata, for instance) from wide-ranging reading.
Something interesting happened here; for a few months whenever I doubted my memory it always ended up that my memory was correct and my doubt was needless, so eventually I decided to stop doubting my memory as much and trust it more. As soon as I started doing that is when I inverted the replication results, which implies that the doubt itself was keeping my memory honest. I’m not sure if that should have been the obvious model beforehand.
Also our discussion caused me to update somewhat away from thinking that parapsychology is positive rather than neutral or negative evidence for psi; that wasn’t a belief I held strongly in the first place. I think it’s unfortunate that the focus was on Bem rather than on, say, PEAR, and would like to discuss the PEAR studies specifically at some point, those being the studies that I am most familiar with. Anyway thanks for putting so much effort into looking into the question; I think it’d be cool if you made a post specifically about lessons learned from psi and how they apply to other fields, especially the heuristics and biases parts of social psychology. The last paragraph of your most recent post was I think the most important.
lessons learned from psi and how they apply to other fields
One lesson is that it’s possible to waste the valuable time and money of many people by not checking claims before throwing them out. Bogus papers and other nonsense can create negative externalities quite a bit larger than their cost of production. Applying local checks (confirmatory experiments, active search for disconfirmation) is worth doing before fouling the common pool.
If you want to talk about PEAR you should present your arguments and references, and make a prediction about how much of the important stuff (as judged later) you have found. I don’t want to play whack-a-mole.
If you want to talk about PEAR you should present your arguments and references, and make a prediction about how much of the important stuff (as judged later) you have found. I don’t want to play whack-a-mole.
I get the impression that the opportunity cost of your time is high and that I could never be confident enough that my presentation of the arguments was at a sufficiently high level that it’d be worth taking the risk of imposing even a minor moral obligation on you to respond, so that’ll probably never happen.
Your continued posting on this is more trouble to me than efficiently responding. A few quick points:
The PEAR people are now selling supposed psi-controlled-meditation-lamps for $189 each as well as DVDs and other junk; the PEAR research was donation supported, and positive results meant more donations
In Damien Broderick’s pro-psi book (in agreement with the PEAR docs) he notes that “PK” effects show up in the literature even if they’re already set in advance (e.g. digits of pi). Broderick’s account is that the psi reaches across time and space; bad statistics are time-symmetrical, PK takes a big probability penalty (even aside from time, the people don’t get to see or be present at the setting of the numbers.
PK experiments “worked” with macroscopic objects like dice, and were said to so work by psi-proponents like Radin (they also have positive meta-analyses, declining with better controls, dealing with cheating and misrecording, etc), but can’t be delivered for tasks like moving very light (stationary) objects, affecting ultra-sensitive scales, etc; bad statistics work regardless of scale, but for psi that’s a wacky combo
Failure of replication
Failure of registration, lack of blinding, specified confirmatory studies with large fixed samples (this could have been done by using a random number generator in the hands of a third party, with electronic communication, leaving an unambigous trail)
Effect sizes so small (a few per ten thousand according to PEAR) that the combination of biased errors in data entry (found in audits of other studies), tiny amounts of fraud (there were many staff over time), some publication bias, etc, could easily generate the results
Broken up into many individualized experiments: combined with optional stopping and other effects, enabling concentration of “hits” in the published component (larger studies, smaller effects)
Multiple long-term and rotating employees with the opportunity for some fraud, it’s not a question of one person
Your continued posting on this is more trouble to me than efficiently responding.
Aight, then I won’t post about parapsychology.
Thanks for the quick points, I disagree on a few points but I think it’s essentially certain that you’re taking into account the significance of failure of registration and replication in ways that I don’t have enough knowledge to have done, which almost certainly overrides any superior knowledge I might have on the points where I disagree.
Also I really would like an example of a case where I stuck my head out about decision theory and it was chopped off; I think there’s a serious risk that you’re overgeneralizing, especially as I never had much confidence in (my appraisal of) the worth of the parapsychology literature in the first place.
ETA: My interest in parapsychology was explicitly the result of rationalization; I started out by thinking that psi was real, then looked at the literature to see which parts seemed like legitimate support of that known fact. Unsurprisingly the rationalized findings weren’t as good as they seemed. This style of model-building has very little to do with the style of model-building I use when actually thinking, e.g. thinking about decision theory or moral philosophy generally.
Similar things have happened when you have stuck your neck out on decision theory for attentive experts to chop off.
Huh? Link/example?
Looking back at this exchange, I want to note that you were able to invert the public empirical data on this question, along with the several other errors elsewhere in this thread, despite (or because of) the overconfidence of your initial claims.
Inverting the replication results is the only error I see; I admitted to another error that I don’t actually think was an error as such (I think I was trying too hastily to appear reasonable and conciliatory) and intend to go back and explain why I disagree. I made a bad analogy but that doesn’t defeat the point I was trying to argue with the analogy.
Note that being flustered and making abnormal errors while arguing about parapsychology is exactly what my model of evasive psi predicts.
Try Google Scholar, and papers citing Bem. In my article I mentioned and linked one apparent replication that turned out not to be one. I described the Wiseman paper’s saga. Here’s something claiming to be another failure to replicate, conducted at a UFOlogy event, and citing another failed replication (although maybe not a closely-matched one).
If you were hanging out at psychic blogs they might have been stretching the definition of replication to mean anything supporting time-traveling ESP (regardless of whether it uses the same procedures, gets the same results, was written after Bem, or was pre-registered), or they might have access to work that has not yet been put up on the internet in working paper or publication form.
ETA: The wikipedia article on Daryl Bem says that at least one of the Wiseman-registered studies has now replicated something from Bem, but doesn’t identify the study or provide a source for the claim.
Yeah, if a majority of sanely-conducted replications of that study succeed, I’ll reassess and take psi much more seriously. But so far the Bem saga is almost exactly what I would expect to see, given psi being fake.
Thanks, I’d appreciate that. I’d also love to hear about other kinds of evidence that are approximately as convincing as replicated big journal studies. I don’t expect that evidence to exist or else I’d have updated already, but if it does that information is valuable.
See Wagenmaker’s et al “Why psychologists must change the way they analyze data: the case of psy” for why the Bem results are simply artifacts of bad method and statistics. Pdf here.
The methods that Bem uses in his experimentation itself has been viewed as controversial as well. According to understood statistical methodology, Bem incorrectly provides one-sided p values when he should have used a two-sided p values.[17] This could possibly account for the marginally significant results that he produced in his experiment. A rebuttal to the Wagenmakers et al. critique by Bem and two statisticians was subsequently published in the Journal of Personality and Social Psychology.[18]
When I have seen back-and-forths like this it’s always been the pro-psi parapsychologists who understood the statistics better. Do you or does anyone know if that isn’t true in this case?
One common error made by skeptics is to say that the low prior on psi means that after a Bayesian correction any individual experiment or paper isn’t enough to drive belief in psi, so it is “not scientific evidence.” That’s overstatement: if psi were real one would combine the odds ratio of multiple experiments (insofar as they were honest and independent) and overcome that, so the individual pieces would have to be published to accumulate that evidence. It’s partly driven by scientific etiquette: the reason one can’t aggregate studies like that is because there are systematic errors, bias, and fraud. Given the sheer extent of those observed in the record, it’s very hard for a set of experiments to provide decisive evidence without some extraordinary evidence supporting their quality and honesty. Jaynes has an impressively thorough discussion of these issues in his probability textbook. The linked paper critiquing Bem didn’t make that error.
Looking at the exchange you mention here are links to the continuation:
Part of the argument was over whether to expect effect sizes to be miniscule if psi exists (Bem argues that existing research has already disconfirmed big psi effects, so that the penalty for that should be incorporated into our beliefs prior to his experiments, rather than the odds ratio stemming from the experiments. The rest was over whether Bem engaged in data-mining. Bem denies it, but has also written guides to students advocating intensive data-mining, and there are various suspicious elements in the paper that suggest it.
Both sides here seem to understand the statistics under discussion well enough, the back-and-forth is about psi and Bem’s methods or honesty, i.e. flaws in the experimental design, data mining, deception, or luck/file drawer/publication bias effects. Failures to replicate will indicate one or more of those (replications will have to be tested for systematic flaws in the replication package, of course).
Can you link to a replication? I know of several failed replications that aren’t well known thanks to publication bias, but I’m unfamiliar with any successful ones.
I assign very low probability to psi for several reasons. The first is evolution. If psi is an evolved feature of human brains, then it should be universal in humans and provide fitness advantages, but I’ve seen no evidence of these fitness advantages.
The second is that it is convenient. Humans like fantasizing about having superpowers. That is by far a simpler explanation for widespread belief and academic study.
The third is the economic argument, as explained by Randall Munroe here.
The fourth you’ve already mentioned; psi is an almost textbook example of the mind-projection fallacy.
None of these are knock-down arguments, but together they convince me that psi is not worth looking into unless something changes (a big result published in a mainstream journal that replicates, e.g.).
A big result was very recently published in a mainstream journal and has been replicated. Now what?
ETA: See Carl’s correction. I can’t immediately find the alleged replications and there have been a few failed replications.
Well, it has failed to replicate a few times. Link to the replication please?
I couldn’t immediately find it when I searched a few minutes ago. I remember reading that it had failed to replicate like twice out of like seven and succeeded the other five times. (I remember having read your article and thus was surprised to see it had in fact replicated. This was before I took psi at all seriously.) I presume that someone wouldn’t straight-up lie about how many times it’d been replicated. I think I was reading a reputable source but am not super-confident of that. Help, anyone? If not I’ll try searching again soon.
I updated my post on this with numbers and links for several more of the attempted replications.
Looking back at this exchange, I want to note that you were able to invert the public empirical data on this question, along with the several other errors elsewhere in this thread, despite (or because of) the overconfidence of your initial claims. Similar things have happened when you have stuck your neck out on decision theory for attentive experts to chop off. You should generalize and a) put more effort into disconfirming your ideas; b) reduce your confidence in seemingly crazy contrarian views backed by vague impressions (lacking good metadata, for instance) from wide-ranging reading.
Something interesting happened here; for a few months whenever I doubted my memory it always ended up that my memory was correct and my doubt was needless, so eventually I decided to stop doubting my memory as much and trust it more. As soon as I started doing that is when I inverted the replication results, which implies that the doubt itself was keeping my memory honest. I’m not sure if that should have been the obvious model beforehand.
Also our discussion caused me to update somewhat away from thinking that parapsychology is positive rather than neutral or negative evidence for psi; that wasn’t a belief I held strongly in the first place. I think it’s unfortunate that the focus was on Bem rather than on, say, PEAR, and would like to discuss the PEAR studies specifically at some point, those being the studies that I am most familiar with. Anyway thanks for putting so much effort into looking into the question; I think it’d be cool if you made a post specifically about lessons learned from psi and how they apply to other fields, especially the heuristics and biases parts of social psychology. The last paragraph of your most recent post was I think the most important.
One lesson is that it’s possible to waste the valuable time and money of many people by not checking claims before throwing them out. Bogus papers and other nonsense can create negative externalities quite a bit larger than their cost of production. Applying local checks (confirmatory experiments, active search for disconfirmation) is worth doing before fouling the common pool.
If you want to talk about PEAR you should present your arguments and references, and make a prediction about how much of the important stuff (as judged later) you have found. I don’t want to play whack-a-mole.
I get the impression that the opportunity cost of your time is high and that I could never be confident enough that my presentation of the arguments was at a sufficiently high level that it’d be worth taking the risk of imposing even a minor moral obligation on you to respond, so that’ll probably never happen.
Your continued posting on this is more trouble to me than efficiently responding. A few quick points:
The PEAR people are now selling supposed psi-controlled-meditation-lamps for $189 each as well as DVDs and other junk; the PEAR research was donation supported, and positive results meant more donations
In Damien Broderick’s pro-psi book (in agreement with the PEAR docs) he notes that “PK” effects show up in the literature even if they’re already set in advance (e.g. digits of pi). Broderick’s account is that the psi reaches across time and space; bad statistics are time-symmetrical, PK takes a big probability penalty (even aside from time, the people don’t get to see or be present at the setting of the numbers.
PK experiments “worked” with macroscopic objects like dice, and were said to so work by psi-proponents like Radin (they also have positive meta-analyses, declining with better controls, dealing with cheating and misrecording, etc), but can’t be delivered for tasks like moving very light (stationary) objects, affecting ultra-sensitive scales, etc; bad statistics work regardless of scale, but for psi that’s a wacky combo
Failure of replication
Failure of registration, lack of blinding, specified confirmatory studies with large fixed samples (this could have been done by using a random number generator in the hands of a third party, with electronic communication, leaving an unambigous trail)
Effect sizes so small (a few per ten thousand according to PEAR) that the combination of biased errors in data entry (found in audits of other studies), tiny amounts of fraud (there were many staff over time), some publication bias, etc, could easily generate the results
Broken up into many individualized experiments: combined with optional stopping and other effects, enabling concentration of “hits” in the published component (larger studies, smaller effects)
Multiple long-term and rotating employees with the opportunity for some fraud, it’s not a question of one person
Aight, then I won’t post about parapsychology.
Thanks for the quick points, I disagree on a few points but I think it’s essentially certain that you’re taking into account the significance of failure of registration and replication in ways that I don’t have enough knowledge to have done, which almost certainly overrides any superior knowledge I might have on the points where I disagree.
Also I really would like an example of a case where I stuck my head out about decision theory and it was chopped off; I think there’s a serious risk that you’re overgeneralizing, especially as I never had much confidence in (my appraisal of) the worth of the parapsychology literature in the first place.
ETA: My interest in parapsychology was explicitly the result of rationalization; I started out by thinking that psi was real, then looked at the literature to see which parts seemed like legitimate support of that known fact. Unsurprisingly the rationalized findings weren’t as good as they seemed. This style of model-building has very little to do with the style of model-building I use when actually thinking, e.g. thinking about decision theory or moral philosophy generally.
Huh? Link/example?
Inverting the replication results is the only error I see; I admitted to another error that I don’t actually think was an error as such (I think I was trying too hastily to appear reasonable and conciliatory) and intend to go back and explain why I disagree. I made a bad analogy but that doesn’t defeat the point I was trying to argue with the analogy.
Note that being flustered and making abnormal errors while arguing about parapsychology is exactly what my model of evasive psi predicts.
This video has Samuel Moulton discussing an unpublished, unregistered failure to replicate Bem’s work.
Try Google Scholar, and papers citing Bem. In my article I mentioned and linked one apparent replication that turned out not to be one. I described the Wiseman paper’s saga. Here’s something claiming to be another failure to replicate, conducted at a UFOlogy event, and citing another failed replication (although maybe not a closely-matched one).
If you were hanging out at psychic blogs they might have been stretching the definition of replication to mean anything supporting time-traveling ESP (regardless of whether it uses the same procedures, gets the same results, was written after Bem, or was pre-registered), or they might have access to work that has not yet been put up on the internet in working paper or publication form.
ETA: The wikipedia article on Daryl Bem says that at least one of the Wiseman-registered studies has now replicated something from Bem, but doesn’t identify the study or provide a source for the claim.
Yeah, if a majority of sanely-conducted replications of that study succeed, I’ll reassess and take psi much more seriously. But so far the Bem saga is almost exactly what I would expect to see, given psi being fake.
Makes sense. If I find any replicated studies in big journals I’ll let you know; I only remembered the Bem one because it was pretty recent.
Thanks, I’d appreciate that. I’d also love to hear about other kinds of evidence that are approximately as convincing as replicated big journal studies. I don’t expect that evidence to exist or else I’d have updated already, but if it does that information is valuable.
See Wagenmaker’s et al “Why psychologists must change the way they analyze data: the case of psy” for why the Bem results are simply artifacts of bad method and statistics. Pdf here.
Wikipedia
When I have seen back-and-forths like this it’s always been the pro-psi parapsychologists who understood the statistics better. Do you or does anyone know if that isn’t true in this case?
One common error made by skeptics is to say that the low prior on psi means that after a Bayesian correction any individual experiment or paper isn’t enough to drive belief in psi, so it is “not scientific evidence.” That’s overstatement: if psi were real one would combine the odds ratio of multiple experiments (insofar as they were honest and independent) and overcome that, so the individual pieces would have to be published to accumulate that evidence. It’s partly driven by scientific etiquette: the reason one can’t aggregate studies like that is because there are systematic errors, bias, and fraud. Given the sheer extent of those observed in the record, it’s very hard for a set of experiments to provide decisive evidence without some extraordinary evidence supporting their quality and honesty. Jaynes has an impressively thorough discussion of these issues in his probability textbook. The linked paper critiquing Bem didn’t make that error.
Looking at the exchange you mention here are links to the continuation:
http://dbem.ws/ResponsetoWagenmakers.pdf http://www.ruudwetzels.com/articles/ClarificationsForBemUttsJohnson.pdf
Part of the argument was over whether to expect effect sizes to be miniscule if psi exists (Bem argues that existing research has already disconfirmed big psi effects, so that the penalty for that should be incorporated into our beliefs prior to his experiments, rather than the odds ratio stemming from the experiments. The rest was over whether Bem engaged in data-mining. Bem denies it, but has also written guides to students advocating intensive data-mining, and there are various suspicious elements in the paper that suggest it.
Both sides here seem to understand the statistics under discussion well enough, the back-and-forth is about psi and Bem’s methods or honesty, i.e. flaws in the experimental design, data mining, deception, or luck/file drawer/publication bias effects. Failures to replicate will indicate one or more of those (replications will have to be tested for systematic flaws in the replication package, of course).
(Will continue the general discussion soon. Just airing out my brain a bit.)
Previously I claimed:
I’m thinking of, say, four or five times when I looked into it. I was wondering, does your experience agree with mine, or disagree?
Also, follow the links to the two blog posts mentioned at the bottom of this.
Can you link to a replication? I know of several failed replications that aren’t well known thanks to publication bias, but I’m unfamiliar with any successful ones.