What do you think of Gelman’s criticism of the paper as, on scientific grounds, complete tosh? Or as he puts it, after a paragraph of criticisms that amount to that verdict, “the evidence from their paper isn’t as strong as they make it out to be”?
Well, the statistical criticisms they mention seem less damning than the statistical problems of the average psych paper.
Beyond all that, I found the claimed effects implausibly large. For example, they report that, among women in relationships, 40% in the ovulation period supported Romney, compared to 23% in the non-fertile part of their cycle.
This does seem rather large, unless they specifically targeted undecided swing voters. But its far from the only psych paper with unreasonably large effect size.
Basically, this paper probably actually only constitutes weak evidence, like most of psycology. But it sounds good enough to be published.
Incidentally, I have a thesis in mathematical psychology due in in a few days, in which I (among other things) fail to replicate a paper published in Nature, no matter how hard I massage the data.
Well, the statistical criticisms they mention seem less damning than the statistical problems of the average psych paper.
Talk about faint praise!
But its far from the only psych paper with unreasonably large effect size.
It’s far from the only psych paper Gelman has slammed either.
Basically, this paper probably actually only constitutes weak evidence, like most of psycology.
Such volumes of faint praise!
But it sounds good enough to be published.
The work of Ioannidis and others is well-known, and it’s clear that the problems he identifies in medical research apply as much or more to psychology. Statisticians such as Gelman pound on junk papers. And yet people still consider stuff like the present paper (which I haven’t read, I’m just going by what Gelman says about it) to be good enough to be published. Why?
Gelman says, and I quote, ”...let me emphasize that I’m not saying that their claims (regarding the effects of ovulation) are false. I’m just saying that the evidence from their paper isn’t as strong as they make it out to be.” I think he would say this about 90%+ of papers in psych.
The work of Ioannidis and others is well-known, and it’s clear that the problems he identifies in medical research apply as much or more to psychology.
Medical research has massive problems of its own, because of the profit motive to fake data.
Statisticians such as Gelman pound on junk papers. And yet people still consider stuff like the present paper (which I haven’t read, I’m just going by what Gelman says about it) to be good enough to be published. Why?
Well, my cynical side would like to say that it’s not in anyone’s interests to push for higher standards—rocking the boat will not advance anyone’s career.
But maybe we’re holding people to unreasonably high standards. Expecting one person to be able to do psychology and neuroscience and stats and computer programming seems like an unreasonable demand, and yet this is what is expected. Is it any wonder that some people who are very good at psychology might screw up the stats?
I had wondered about whether the development of some sort of automated stats program would help. By this, I mean that instead of inputting the data and running a t-test manually, the program determines whether the data is approximately normally distributed, whether taking logs will transform it to a normal distribution, and so forth, before running the appropriate analysis and spitting out a write-up which can be dropped straight into the paper.
It would save a lot of effort and avoid a lot of mistakes. If there is a consensus that certain forms of reporting are better than others, e.g.
Instead, what do we get? Several pages full of averages, percentages, F tests, chi-squared tests, and p-values, all presented in paragraph form. Better to have all possible comparisons in one convenient table.
Then the program could present the results in an absolutely standard format.
Expecting one person to be able to do psychology and neuroscience and stats and computer programming seems like an unreasonable demand
Most papers have multiple authors. If you need to do heavy lifting in stats, bring a statistician on board.
whether the development of some sort of automated stats program would help
I don’t think so. First, I can’t imagine it being flexible enough (and if it’s too flexible its reason for existence is lost) and second it will just be gamed. People like Gelman think that the reliance on t-tests is a terrible idea, anyway, and I tend to agree with him.
My preference is for a radical suggestion: make papers openly provide their data and their calculations (e.g. as a download). After all, this is supposed to be science, right?
This “radical” suggestion is now a funding condition of at least some UK research councils (along with requirements to publish publically funded work in open access forms). A very positive move.… If enforced.
Most papers have multiple authors. If you need to do heavy lifting in stats, bring a statistician on board.
I don’t think this just applies to heavy lifting—basic stats are pretty confusing given that most seem to rely on the assumption of a normal distribution, which is a mathematical abstraction that rarely occurs in real life. And in reality, people don’t bring specialists on board, at least not that I have seen.
My preference is for a radical suggestion: make papers openly provide their data and their calculations (e.g. as a download). After all, this is supposed to be science, right?
I understand why this was not done back when journals were printed on paper, but it really should be done now.
basic stats are pretty confusing given that most seem to rely on the assumption of a normal distribution
If a psych researcher finds “basic stats” confusing, he is not qualified to write a paper which looks at statistical interpretations of whatever results he got. He should either acquire some competency or stop pretending he understands what he is writing.
Many estimates do rely on the assumption of a normal distribution in the sense that these estimates have characteristics (e.g. “unbiased” or “most efficient”) which are mathematically proven in the normal distribution case. If this assumption breaks down, these characteristics are no longer guaranteed. This does not mean that the estimates are now “bad” or useless—in many cases they are still the best you could go given the data.
To give a crude example, 100 is guaranteed to be biggest number in the [1 .. 100] set of integers. If your set of integers is “from one to about a hundred, more or less”, 100 is no longer guaranteed to be the biggest, but it’s still not a bad estimate of the biggest number in that set.
If a psych researcher finds “basic stats” confusing, he is not qualified to write a paper which looks at statistical interpretations of whatever results he got. He should either acquire some competency or stop pretending he understands what he is writing.
The problem is that psychology and statistics are different skills, and someone who is talented at one may not be talented at the other.
To give a crude example, 100 is guaranteed to be biggest number in the [1 .. 100] set of integers. If your set of integers is “from one to about a hundred, more or less”, 100 is no longer guaranteed to be the biggest, but it’s still not a bad estimate of the biggest number in that set.
I take your point, but you can no longer say that 100 is the biggest number with 95% confidence, and this is the problem.
someone who is talented at one may not be talented at the other.
You don’t need to be talented, you only need to be competent. If you can’t pass even that low bar, maybe you shouldn’t publish papers which use statistics.
you can no longer say that 100 is the biggest number with 95% confidence, and this is the problem.
I don’t see any problem here.
First, 95% is an arbitrary number, it’s pure convention that does not correspond to any joint in the underlying reality.
Second, the t-test does NOT mean what most people think it means. See e.g. this or this.
Third, and most important, your certainty level should be entirely determined by the data. If your data does not support 95% confidence, then it does not. Trying to pretend otherwise is fraud.
I had wondered about whether the development of some sort of automated stats program would help. By this, I mean that instead of inputting the data and running a t-test manually, the program determines whether the data is approximately normally distributed, whether taking logs will transform it to a normal distribution, and so forth, before running the appropriate analysis and spitting out a write-up which can be dropped straight into the paper.
Sounds like the mythical Photoshop “Make Art” button.
What do you think of Gelman’s criticism of the paper as, on scientific grounds, complete tosh? Or as he puts it, after a paragraph of criticisms that amount to that verdict, “the evidence from their paper isn’t as strong as they make it out to be”?
Well, the statistical criticisms they mention seem less damning than the statistical problems of the average psych paper.
This does seem rather large, unless they specifically targeted undecided swing voters. But its far from the only psych paper with unreasonably large effect size.
Basically, this paper probably actually only constitutes weak evidence, like most of psycology. But it sounds good enough to be published.
Incidentally, I have a thesis in mathematical psychology due in in a few days, in which I (among other things) fail to replicate a paper published in Nature, no matter how hard I massage the data.
Talk about faint praise!
It’s far from the only psych paper Gelman has slammed either.
Such volumes of faint praise!
The work of Ioannidis and others is well-known, and it’s clear that the problems he identifies in medical research apply as much or more to psychology. Statisticians such as Gelman pound on junk papers. And yet people still consider stuff like the present paper (which I haven’t read, I’m just going by what Gelman says about it) to be good enough to be published. Why?
Gelman says, and I quote, ”...let me emphasize that I’m not saying that their claims (regarding the effects of ovulation) are false. I’m just saying that the evidence from their paper isn’t as strong as they make it out to be.” I think he would say this about 90%+ of papers in psych.
Yes. I think he would too. So much the worse for psychology.
And yet people are willing to take its pronouncements seriously.
Medical research has massive problems of its own, because of the profit motive to fake data.
Well, my cynical side would like to say that it’s not in anyone’s interests to push for higher standards—rocking the boat will not advance anyone’s career.
But maybe we’re holding people to unreasonably high standards. Expecting one person to be able to do psychology and neuroscience and stats and computer programming seems like an unreasonable demand, and yet this is what is expected. Is it any wonder that some people who are very good at psychology might screw up the stats?
I had wondered about whether the development of some sort of automated stats program would help. By this, I mean that instead of inputting the data and running a t-test manually, the program determines whether the data is approximately normally distributed, whether taking logs will transform it to a normal distribution, and so forth, before running the appropriate analysis and spitting out a write-up which can be dropped straight into the paper.
It would save a lot of effort and avoid a lot of mistakes. If there is a consensus that certain forms of reporting are better than others, e.g.
Then the program could present the results in an absolutely standard format.
Most papers have multiple authors. If you need to do heavy lifting in stats, bring a statistician on board.
I don’t think so. First, I can’t imagine it being flexible enough (and if it’s too flexible its reason for existence is lost) and second it will just be gamed. People like Gelman think that the reliance on t-tests is a terrible idea, anyway, and I tend to agree with him.
My preference is for a radical suggestion: make papers openly provide their data and their calculations (e.g. as a download). After all, this is supposed to be science, right?
This “radical” suggestion is now a funding condition of at least some UK research councils (along with requirements to publish publically funded work in open access forms). A very positive move.… If enforced.
I don’t think this just applies to heavy lifting—basic stats are pretty confusing given that most seem to rely on the assumption of a normal distribution, which is a mathematical abstraction that rarely occurs in real life. And in reality, people don’t bring specialists on board, at least not that I have seen.
I understand why this was not done back when journals were printed on paper, but it really should be done now.
If a psych researcher finds “basic stats” confusing, he is not qualified to write a paper which looks at statistical interpretations of whatever results he got. He should either acquire some competency or stop pretending he understands what he is writing.
Many estimates do rely on the assumption of a normal distribution in the sense that these estimates have characteristics (e.g. “unbiased” or “most efficient”) which are mathematically proven in the normal distribution case. If this assumption breaks down, these characteristics are no longer guaranteed. This does not mean that the estimates are now “bad” or useless—in many cases they are still the best you could go given the data.
To give a crude example, 100 is guaranteed to be biggest number in the [1 .. 100] set of integers. If your set of integers is “from one to about a hundred, more or less”, 100 is no longer guaranteed to be the biggest, but it’s still not a bad estimate of the biggest number in that set.
The problem is that psychology and statistics are different skills, and someone who is talented at one may not be talented at the other.
I take your point, but you can no longer say that 100 is the biggest number with 95% confidence, and this is the problem.
You don’t need to be talented, you only need to be competent. If you can’t pass even that low bar, maybe you shouldn’t publish papers which use statistics.
I don’t see any problem here.
First, 95% is an arbitrary number, it’s pure convention that does not correspond to any joint in the underlying reality.
Second, the t-test does NOT mean what most people think it means. See e.g. this or this.
Third, and most important, your certainty level should be entirely determined by the data. If your data does not support 95% confidence, then it does not. Trying to pretend otherwise is fraud.
Sounds like the mythical Photoshop “Make Art” button.
It has been pointed out long time ago that a programmer’s keyboard really needs to have a DWIM (Do What I Mean) key...