The comparison to parapsychology is a really poor one in this case—for what should be pretty obvious reasons.
Well, whatever this might say about me, the reasons aren’t obvious to me.
For example, we know there is no file drawer effect.
Right, but as I understand it, you don’t need a file drawer effect to see that some of the experiments done in parapsychology still have devastatingly tiny p-values on their own, such as through the Stanford Research Institute.. So the file drawer effect isn’t really the right way to challenge the analogy.
But more importantly this was a six sigma deviation from theoretical prediction. As far as I know, that is unheard of in parapsychology.
I actually don’t know what that means. Is sigma being used to indicate standard deviation? If so, then yes, there have been a number of parapsychology experiments that went in that range of accuracy—some moreso if I recall correctly. (It has been many years since I read into that stuff, so I could be misremembering.)
We cannot treat physics the way we treat psychology.
My point is actually more about statistics than science, so any system that uses frequentist statistics to extract truth is going to suffer from this kind of comparison. As I understand it, the statistical methods that are used to verify measurements like this FTL neutrino phenomenon are the same kinds of techniques used to demonstrate that people can psychokinetically affect random-number generators. So either parapsychology is ridiculous because it uses bad statistical methods (in which case there’s a significant chance that this FTL finding is a statistical error), or we can trust the statistical methods that CERN used (which seems to force us to trust the statistical methods that parapsychologists use.)
(Disclaimer: I’m not trying to argue anything about parapsychology here. I’m only attempting to point out that, best as I can tell, the argument for parapsychology as the control group for science seems to suggest that the CERN results stand a fair chance of being bad statistics in action. If A implies B and we’re asserting probably-not-B, then we have to accept probably-not-A.)
Right, but as I understand it, you don’t need a file drawer effect to see that some of the experiments done in parapsychology still have devastatingly tiny p-values on their own, such as through the Stanford Research Institute.. So the file drawer effect isn’t really the right way to challenge the analogy.
How is that?
I actually don’t know what that means. Is sigma being used to indicate standard deviation? If so, then yes, there have been a number of parapsychology experiments that went in that range of accuracy—some moreso if I recall correctly. (It has been many years since I read into that stuff, so I could be misremembering.)
You need to provide links because I read a fair bit on the subject and don’t recall this. If I came across such results my money would be on fraud of systematic error- not a statistical fluke.
So either parapsychology is ridiculous because it uses bad statistical methods (in which case there’s a significant chance that this FTL finding is a statistical error), or we can trust the statistical methods that CERN used (which seems to force us to trust the statistical methods that parapsychologists use.)
This is the kind of “outside-view-taken to the extreme” attitude that just doesn’t make sense. We know why the statistical results of para-psychological studies tend to not be trustworthy- publication bias, file drawer effect, exploratory research turned into hypothesis testing retroactively etc. If we didn’t know why such statistical results couldn’t be trusted the we would be compelled to seriously consider para-psychological claims. My claim is that those reasons don’t apply to neutrino velocity measurements.
You need to provide links because I read a fair bit on the subject and don’t recall this.
That’s a fair request. I don’t really have the time to go digging for those details, though. If you feel so inspired, again I’d point to the work done at the Stanford Research Institute (or at least I think it was that) where they did a ridiculous number of trials of all kinds and did get several standard deviations away from the expected mean predicted based on the null hypothesis. I honestly don’t remember the numbers at all, so you could be right that there has never been anything like a six-s.d. deviation in parapsychological experiments. I seem to recall that they got somewhere around ten—but it has been something like six years since I read anything on this topic.
That said, I get the feeling there’s a bit of goalpost-moving going on in this discussion. In Eliezer’s original reference to parapsychology as the control group for science, his point was that there are some amazingly subjective effects that come into play with frequentist statistics that could account for even the good (by frequentist standards) positive-result studies from parapsychology. I agree, there’s a lot of problem with things like publication bias and the like, and that does offer an explanation for a decent chunk of parapsychology’s material. But to quote Eliezer:
Parapsychology, the control group for science, would seem to be a thriving field with “statistically significant” results aplenty. Oh, sure, the effect sizes are minor. Sure, the effect sizes get even smaller (though still “statistically significant”) as they collect more data. Sure, if you find that people can telekinetically influence the future, a similar experimental protocol is likely to produce equally good results for telekinetically influencing the past. Of which I am less tempted to say, “How amazing! The power of the mind is not bound by time or causality!” and more inclined to say, “Bad statistics are time-symmetrical.” But here’s the thing: Parapsychologists are constantly protesting that they are playing by all the standard scientific rules, and yet their results are being ignored—that they are unfairly being held to higher standards than everyone else. I’m willing to believe that. It just means that the standard statistical methods of science are so weak and flawed as to permit a field of study to sustain itself in the complete absence of any subject matter.
I haven’t looked at the CERN group’s methods in enough detail to know if they’re making the same kind of error. I’m just trying to point out that we can’t assign an abysmally low probability to their making a common kind of statistical error that finds a small-but-low-p-value effect without simultaneously assigning a lower probability to parapsychologists making this same mistake than Eliezer seems to.
And to be clear, I am not saying “Either the CERN group made statistical errors or telepathy exists.” Nor am I trying to defend parapsychology. I’m simply pointing out that we have to be even-handed in our dismissal of low-p-value thinking.
Sure, if you find that people can telekinetically influence the future, a similar experimental protocol is likely to produce equally good results for telekinetically influencing the past. Of which I am less tempted to say, “How amazing! The power of the mind is not bound by time or causality!” and more inclined to say, “Bad statistics are time-symmetrical.”
That doesn’t actually strike me as all that much extra improbability. A whole bunch of the mechanisms would allow both!
Well, whatever this might say about me, the reasons aren’t obvious to me.
Right, but as I understand it, you don’t need a file drawer effect to see that some of the experiments done in parapsychology still have devastatingly tiny p-values on their own, such as through the Stanford Research Institute.. So the file drawer effect isn’t really the right way to challenge the analogy.
I actually don’t know what that means. Is sigma being used to indicate standard deviation? If so, then yes, there have been a number of parapsychology experiments that went in that range of accuracy—some moreso if I recall correctly. (It has been many years since I read into that stuff, so I could be misremembering.)
My point is actually more about statistics than science, so any system that uses frequentist statistics to extract truth is going to suffer from this kind of comparison. As I understand it, the statistical methods that are used to verify measurements like this FTL neutrino phenomenon are the same kinds of techniques used to demonstrate that people can psychokinetically affect random-number generators. So either parapsychology is ridiculous because it uses bad statistical methods (in which case there’s a significant chance that this FTL finding is a statistical error), or we can trust the statistical methods that CERN used (which seems to force us to trust the statistical methods that parapsychologists use.)
(Disclaimer: I’m not trying to argue anything about parapsychology here. I’m only attempting to point out that, best as I can tell, the argument for parapsychology as the control group for science seems to suggest that the CERN results stand a fair chance of being bad statistics in action. If A implies B and we’re asserting probably-not-B, then we have to accept probably-not-A.)
How is that?
You need to provide links because I read a fair bit on the subject and don’t recall this. If I came across such results my money would be on fraud of systematic error- not a statistical fluke.
This is the kind of “outside-view-taken to the extreme” attitude that just doesn’t make sense. We know why the statistical results of para-psychological studies tend to not be trustworthy- publication bias, file drawer effect, exploratory research turned into hypothesis testing retroactively etc. If we didn’t know why such statistical results couldn’t be trusted the we would be compelled to seriously consider para-psychological claims. My claim is that those reasons don’t apply to neutrino velocity measurements.
That’s a fair request. I don’t really have the time to go digging for those details, though. If you feel so inspired, again I’d point to the work done at the Stanford Research Institute (or at least I think it was that) where they did a ridiculous number of trials of all kinds and did get several standard deviations away from the expected mean predicted based on the null hypothesis. I honestly don’t remember the numbers at all, so you could be right that there has never been anything like a six-s.d. deviation in parapsychological experiments. I seem to recall that they got somewhere around ten—but it has been something like six years since I read anything on this topic.
That said, I get the feeling there’s a bit of goalpost-moving going on in this discussion. In Eliezer’s original reference to parapsychology as the control group for science, his point was that there are some amazingly subjective effects that come into play with frequentist statistics that could account for even the good (by frequentist standards) positive-result studies from parapsychology. I agree, there’s a lot of problem with things like publication bias and the like, and that does offer an explanation for a decent chunk of parapsychology’s material. But to quote Eliezer:
I haven’t looked at the CERN group’s methods in enough detail to know if they’re making the same kind of error. I’m just trying to point out that we can’t assign an abysmally low probability to their making a common kind of statistical error that finds a small-but-low-p-value effect without simultaneously assigning a lower probability to parapsychologists making this same mistake than Eliezer seems to.
And to be clear, I am not saying “Either the CERN group made statistical errors or telepathy exists.” Nor am I trying to defend parapsychology. I’m simply pointing out that we have to be even-handed in our dismissal of low-p-value thinking.
That doesn’t actually strike me as all that much extra improbability. A whole bunch of the mechanisms would allow both!