It is a policy that doesn’t just exist in psychology. Some journals in other fields have similar policies requiring that the work include something more than just a replication of the study in question, but my impression is that this is much more common in the less rigorous areas like psychology. Journals probably do this because they want to be considered cutting edge and they get less of that if they publish replication attempts. Given that it makes some sense to reject both successful and unsuccessful replications, since if one only included unsuccessful replications then there would be a natural publication bias. So they more or less successfully fob the whole thing off on other journals. (There’s something like an n player’s prisoner dilemma here with journals as the players trying to decide if they accept replications in general.) So this is bad, but it is understandable when one remembers that journals are driven by selfish, status-driven humans, just like everything else in the world.
What rules of thumb do you use to ‘keep this in mind’? I generally try to never put anything in my brain that just has one or two studies behind it. I’ve been thinking of that more as ‘it’s easy to make a mistake in a study’ and ‘maybe this author has some bias that I am unaware of’, but perhaps this cuts in the opposite direction.
Actually, even with many studies and a meta-analysis, you can still get blindsided by publication bias. There are plenty of psi meta-analyses showing positive effects (with studies that were not pre-registered, and are probably very selected), and many more in medicine and elsewhere.
If it’s something I trust an idiot to make the right conclusion on with good data, I’ll look for meta-analyses, p<<0.05, or do a quick and dirty meta analysis myself if the number of studies is sufficiently small. If it’s something I’m surprised has even been tested, I’ll give one study more weight. If it’s something that I’d expect to be tested a lot, I’d give it less. If the data I’m looking for is orthogonal to the data they’re being published for, it probably doesn’t suffer from selection bias so I’ll take it at face value. If the studies result is ‘convenient’ in some way for the source that showed it to me, I’ll be more skeptical of selection bias and misinterpretation.
If it’s a topic where I see very easy to make methodological flaws or interpretation errors, then I’ll try to actually dig in and look for them and see if there’s a new obvious set of conclusions to draw.
Separately from determining how strong the evidence is, I’ll try to ‘put it in my brain’ if there’s only a study or two if it’s testing a hypothesis I already suspected of being true, or if it makes too much sense in hindsight (aka high priors), or put it in my brain with a ‘probably untrue but something to watch out for’ tag otherwise.
What about influencing high-status actors (e.g. prominent universities)? I don’t know what the main influence points are for an academic journal, and I don’t know what things it’s considered acceptable for a university to accept money for, but it seems common to endow a professorship or a (quasi-academic) program.
Probably this method would cost many millions of dollars, but it would be interesting to know the order of magnitude required.
Why might it seem like we don’t have a scientific process?
First, there’s simple nostalgia. As I write this, the space shuttle is on its very last mission. I suspect that almost everyone here either longs for the days of their youth when humans walked on the moon, or wish they had lived then to witness that. Thus, the normal human nostalgia is wrapped up in some actual problems of stagnation and lack of funding. This creates a potential halo effect for the past.
Second, as the number of scientists increases over time, the number of scientists who are putting out poor science will increase. Similarly, the amount of stuff that gets through peer review even when it shouldn’t will increase as the number of journals and the number of papers submitted goes up. So the amount of bad science will go up.
Third, the internet, and similar modern communication technologies lets us find out about so-called bad science much faster than we would otherwise. Much of that would get buried in obscure journals but instead we have bloggers commenting and respected scientists responding. So as time goes on, even if the amount of bad science stays constant, the perception would be of an increase.
I would go so far as to venture that we might have a more robust and widespread scientific process than at any other time in history. To put the Bem study in perspective, keep in mind that a hundred years ago, psychology wasn’t even trying to use statistical methods; look at how Freud and Jung’s ideas were viewed. Areas like sociology and psychology have if anything become more scientific over time. From that standpoint, a paper that uses statistics in a flawed fashion is indicative of how much progress the soft sciences have made in terms of being real sciences in that one needs bad stats to get bad ideas through rather than just anecdotal evidence.
To paraphrase someone speaking on a completely different issue, the arc of history is long, but it bends towards science.
To put the Bem study in perspective, keep in mind that a hundred years ago, psychology wasn’t even trying to use statistical methods; look at how Freud and Jung’s ideas were viewed. Areas like sociology and psychology have if anything become more scientific over time. From that standpoint, a paper that uses statistics in a flawed fashion is indicative of how much progress the soft sciences have made in terms of being real sciences in that one needs bad stats to get bad ideas through rather than just anecdotal evidence.
That’s not really true. Experimental, quantitative, and even fairly advanced statistical methods were definitely used in psychology a century ago. (As a notable milestone, Spearman’s factor analysis that started the still ongoing controversy over the general factor of intelligence was published in 1904.) My impression is that ever since Wilhelm Wundt’s pioneering experimental work that first separated psychology from philosophy in the late 19th century, psychology has been divided between quantitative work based on experiment and observation, which makes at least some pretense of real science, and quack soft stuff that’s usually presented in a medical or ideological context (or some combination thereof). Major outbursts of the latter have happened fairly recently—remember the awful “recovered memories” trend in the 1980s and 1990s (and somewhat even in the 2000s) and its consequences.
But more importantly, I’m not at all sure that the mathematization of soft fields has made them more scientific. One could argue that the contemporary standards for using statistics in soft fields only streamline the production of plausible-looking nonsense. Even worse, sometimes mathematization leads to pseudoscience that has no more connection to reality than mere verbal speculations and sophistries, but looks so impressive and learned that a common-sense criticism can be effectively met with scorn and stonewalling. As the clearest example, it appears evident that macroeconomics is almost complete quackery despite all the abstruse statistics and math used in it, and I see no evidence that the situation in other wannabe-exact soft fields is much better. Or to take another example, at one point I got intensely interested in IQ-related controversies and read a large amount of academic literature in the area—eventually finding that the standards of statistics (and quantitative reasoning in general) on all sides in the controversy are just depressingly bad, often hiding awful lapses of reasoning that would be unimaginable in a real hard science behind a veneer of seeming rigor.
(And ultimately, I notice that your examples of recent discoveries are from biology, astronomy/physics, and math—fields whose basic soundness has never been in doubt. But what non-trivial, correct, and useful insight has come from all these mathematized soft fields?)
This is a very good point. You make a compelling case that the use of careful statistics is not a recent trend in psychology. In that regard, my penultimate paragraph is clearly just deeply and irrecoverably wrong.
(And ultimately, I notice that your examples of recent discoveries are from biology, astronomy/physics, and math—fields whose basic soundness has never been in doubt. But what non-trivial, correct, and useful insight has come from all these mathematized soft fields?)
Well, I was responding to Eliezer’s claim about a general lack of a scientific process. So the specific question then becomes can one give examples of “non-trivial, correct, and useful” psychological results that have occurred in the last year or so. There’s a steady output of decent psychology results. While the early work on cognitive biases was done in the 1980s by Kahneman and Tversky, a lot of work has occurred in the last decade after. But, I agree that the amount of output is slow enough that I can’t point to easy, impressive studies that have occurred in the last few months off the top of my head like I can for other areas of research. Sharon Bertsch and Bryan Pesta’s investigation of different explanations for negative correlation between IQ and religion came out in 2009 and 2010, which isn’t even this year.
However, at the same time, I’m not sure that this is a strike against psychology. Psychology has a comparatively small field of study. Astronomy gets to investigate most of the universe. Math gets to investigate every interesting axiomatic system one can imagine. Biology gets to investigate millions of species. Psychology just gets to investigate one species, and only certain aspects of that species. When psychology does investigate other intelligent species it is often categorized as belonging to other areas. So we shouldn’t be that surprised if psychology doesn’t have as high a production rate. On the other hand, this argument isn’t very good because one could make up for it by lumping all the classical soft sciences together into one area, and one would still have this problem. So overall, your point seems valid in regards to psychology.
I didn’t have in mind just psychology; I was responding to your comment about soft and wannabe-hard fields in general. In particular, this struck me as unwarranted optimism:
[A] paper that uses statistics in a flawed fashion is indicative of how much progress the soft sciences have made in terms of being real sciences in that one needs bad stats to get bad ideas through rather than just anecdotal evidence.
That is true if these sciences are nowadays overwhelmingly based on sound math and statistics, and these bad stats papers are just occasional exceptions. The pessimistic scenario I have in mind is the emergence of bogus fields in which bad formalism is the standard—i.e., in which verbal bad reasoning of the sort seen in, say, old-school Freudianism is replaced by standardized templates of bad formalism. (These are most often, but not always, in the form of bad statistics.)
This, in my opinion, results in an even worse situation. Instead of bad verbal reasoning, which can be criticized convincingly in a straightforward way, as an outside critic you’re now faced with an abstruse bad formalism. This not only makes it more difficult to spot the holes in the logic, but even if you identify them correctly, the “experts” can sneer at you and dismiss you as a crackpot, which will sound convincing to people who have’t taken the effort to work through the bad formalism themselves.
Unless you believe that such bogus fields don’t exist (and I think many examples are fairly obvious), they are clear counterexamples to your above remark. Their “mathematization” has resulted in bullshit being produced in even greater quantities, and shielded against criticism far more strongly that if they were still limited to verbal sophistry.
Another important point, which I think you’re missing, concerns your comment about problematic fields having a relatively small, and arguably less important scope relative to the (mostly) healthy hard fields. The trouble is, the output of some of the most problematic fields is used to direct the decisions and actions of the government and other powerful institutions. From miscarriages of justice due to pseudoscience used in courts to catastrophic economic crises, all kinds of calamities can directly follow from this.
No substantial disagreement with most of your comment. I will just note that most of your points (which do show that I was being overly optimistic) don’t as a whole substantially undermine the basic point being made about Eliezer’s claim.
I think your point about small fields being able to do damage is an interesting one (and one I’ve never seen before) and raises all sorts of issues that I’ll need to think about.
In 2011, we’ve had such novel scientific discoveries as snails that can survive being eaten by birds, we’ve estimated the body temperature of dinosaurs
(...)
Sharon Bertsch and Bryan Pesta’s investigation of different explanations for negative correlation between IQ and religion came out in 2009 and 2010, which isn’t even this year.
Have these results been replicated? Are you sure they’re correct? Merely citing cool-looking results isn’t evidence that the scientific process is working.
Remember, “the scientific process not working” doesn’t look like “cool results stop showing up”, but looks like “cool results keeping showing up except they no longer correspond to reality”. If you have no independent way of verifying the results in question, it’s hard to tell the above scenarios apart.
Bertsch and Pesta’s work has been replicated. The dinosaur temperature estimate is close to estimates made by other techniques—the main interesting thing here is that this is a direct estimate made using the fossil remains rather than working off of metabolic knowledge, body size, and the like. So the dinosaur temperature estimate is in some sense the replication by another technique of strongly suspected results. The snail result is very new; I’m not aware of anything that replicates it.
Hearing one probably bad thing and deciding we fell from grace and should shake our heads in bitter nostalgia? That’s what the villains do. We, the ingroup of Truth and Right and Apple Pie, are dutifully skeptical of claims of increasing stupidity. You taught me that.
I’m at a loss of words at the inanity of this policy.
It is a policy that doesn’t just exist in psychology. Some journals in other fields have similar policies requiring that the work include something more than just a replication of the study in question, but my impression is that this is much more common in the less rigorous areas like psychology. Journals probably do this because they want to be considered cutting edge and they get less of that if they publish replication attempts. Given that it makes some sense to reject both successful and unsuccessful replications, since if one only included unsuccessful replications then there would be a natural publication bias. So they more or less successfully fob the whole thing off on other journals. (There’s something like an n player’s prisoner dilemma here with journals as the players trying to decide if they accept replications in general.) So this is bad, but it is understandable when one remembers that journals are driven by selfish, status-driven humans, just like everything else in the world.
Yes, this is a standard incentives problem. But one to keep in mind when parsing the literature.
What rules of thumb do you use to ‘keep this in mind’? I generally try to never put anything in my brain that just has one or two studies behind it. I’ve been thinking of that more as ‘it’s easy to make a mistake in a study’ and ‘maybe this author has some bias that I am unaware of’, but perhaps this cuts in the opposite direction.
Actually, even with many studies and a meta-analysis, you can still get blindsided by publication bias. There are plenty of psi meta-analyses showing positive effects (with studies that were not pre-registered, and are probably very selected), and many more in medicine and elsewhere.
If it’s something I trust an idiot to make the right conclusion on with good data, I’ll look for meta-analyses, p<<0.05, or do a quick and dirty meta analysis myself if the number of studies is sufficiently small. If it’s something I’m surprised has even been tested, I’ll give one study more weight. If it’s something that I’d expect to be tested a lot, I’d give it less. If the data I’m looking for is orthogonal to the data they’re being published for, it probably doesn’t suffer from selection bias so I’ll take it at face value. If the studies result is ‘convenient’ in some way for the source that showed it to me, I’ll be more skeptical of selection bias and misinterpretation.
If it’s a topic where I see very easy to make methodological flaws or interpretation errors, then I’ll try to actually dig in and look for them and see if there’s a new obvious set of conclusions to draw.
Separately from determining how strong the evidence is, I’ll try to ‘put it in my brain’ if there’s only a study or two if it’s testing a hypothesis I already suspected of being true, or if it makes too much sense in hindsight (aka high priors), or put it in my brain with a ‘probably untrue but something to watch out for’ tag otherwise.
How much money do you think it would take to give replications a journal with status on par with the new-studies-only ones?
Or alternately, how much advocacy of what sort? Is there someone in particular to convince?
It’s not something you can simply buy with money. It’s about getting scientists to cite papers in the replications journal.
What about influencing high-status actors (e.g. prominent universities)? I don’t know what the main influence points are for an academic journal, and I don’t know what things it’s considered acceptable for a university to accept money for, but it seems common to endow a professorship or a (quasi-academic) program.
Probably this method would cost many millions of dollars, but it would be interesting to know the order of magnitude required.
We simply do not have a scientific process any more.
This is both unfair to scientists and inaccurate. In 2011, we’ve had such novel scientific discoveries as snails that can survive being eaten by birds, we’ve estimated the body temperature of dinosaurs, we’ve captured the most detailed picture of a dying star ever taken, and we’ve made small but significant progress to resolving P ?= NP. These are but a few of the highlights that happened to both be in my recent memory and which I could easily locate links to. I’ve also not included anything that could be argued to be engineering rather than science. There are many achievements just like this.
Why might it seem like we don’t have a scientific process?
First, there’s simple nostalgia. As I write this, the space shuttle is on its very last mission. I suspect that almost everyone here either longs for the days of their youth when humans walked on the moon, or wish they had lived then to witness that. Thus, the normal human nostalgia is wrapped up in some actual problems of stagnation and lack of funding. This creates a potential halo effect for the past.
Second, as the number of scientists increases over time, the number of scientists who are putting out poor science will increase. Similarly, the amount of stuff that gets through peer review even when it shouldn’t will increase as the number of journals and the number of papers submitted goes up. So the amount of bad science will go up.
Third, the internet, and similar modern communication technologies lets us find out about so-called bad science much faster than we would otherwise. Much of that would get buried in obscure journals but instead we have bloggers commenting and respected scientists responding. So as time goes on, even if the amount of bad science stays constant, the perception would be of an increase.
I would go so far as to venture that we might have a more robust and widespread scientific process than at any other time in history. To put the Bem study in perspective, keep in mind that a hundred years ago, psychology wasn’t even trying to use statistical methods; look at how Freud and Jung’s ideas were viewed. Areas like sociology and psychology have if anything become more scientific over time. From that standpoint, a paper that uses statistics in a flawed fashion is indicative of how much progress the soft sciences have made in terms of being real sciences in that one needs bad stats to get bad ideas through rather than just anecdotal evidence.
To paraphrase someone speaking on a completely different issue, the arc of history is long, but it bends towards science.
That’s not really true. Experimental, quantitative, and even fairly advanced statistical methods were definitely used in psychology a century ago. (As a notable milestone, Spearman’s factor analysis that started the still ongoing controversy over the general factor of intelligence was published in 1904.) My impression is that ever since Wilhelm Wundt’s pioneering experimental work that first separated psychology from philosophy in the late 19th century, psychology has been divided between quantitative work based on experiment and observation, which makes at least some pretense of real science, and quack soft stuff that’s usually presented in a medical or ideological context (or some combination thereof). Major outbursts of the latter have happened fairly recently—remember the awful “recovered memories” trend in the 1980s and 1990s (and somewhat even in the 2000s) and its consequences.
But more importantly, I’m not at all sure that the mathematization of soft fields has made them more scientific. One could argue that the contemporary standards for using statistics in soft fields only streamline the production of plausible-looking nonsense. Even worse, sometimes mathematization leads to pseudoscience that has no more connection to reality than mere verbal speculations and sophistries, but looks so impressive and learned that a common-sense criticism can be effectively met with scorn and stonewalling. As the clearest example, it appears evident that macroeconomics is almost complete quackery despite all the abstruse statistics and math used in it, and I see no evidence that the situation in other wannabe-exact soft fields is much better. Or to take another example, at one point I got intensely interested in IQ-related controversies and read a large amount of academic literature in the area—eventually finding that the standards of statistics (and quantitative reasoning in general) on all sides in the controversy are just depressingly bad, often hiding awful lapses of reasoning that would be unimaginable in a real hard science behind a veneer of seeming rigor.
(And ultimately, I notice that your examples of recent discoveries are from biology, astronomy/physics, and math—fields whose basic soundness has never been in doubt. But what non-trivial, correct, and useful insight has come from all these mathematized soft fields?)
This is a very good point. You make a compelling case that the use of careful statistics is not a recent trend in psychology. In that regard, my penultimate paragraph is clearly just deeply and irrecoverably wrong.
Well, I was responding to Eliezer’s claim about a general lack of a scientific process. So the specific question then becomes can one give examples of “non-trivial, correct, and useful” psychological results that have occurred in the last year or so. There’s a steady output of decent psychology results. While the early work on cognitive biases was done in the 1980s by Kahneman and Tversky, a lot of work has occurred in the last decade after. But, I agree that the amount of output is slow enough that I can’t point to easy, impressive studies that have occurred in the last few months off the top of my head like I can for other areas of research. Sharon Bertsch and Bryan Pesta’s investigation of different explanations for negative correlation between IQ and religion came out in 2009 and 2010, which isn’t even this year.
However, at the same time, I’m not sure that this is a strike against psychology. Psychology has a comparatively small field of study. Astronomy gets to investigate most of the universe. Math gets to investigate every interesting axiomatic system one can imagine. Biology gets to investigate millions of species. Psychology just gets to investigate one species, and only certain aspects of that species. When psychology does investigate other intelligent species it is often categorized as belonging to other areas. So we shouldn’t be that surprised if psychology doesn’t have as high a production rate. On the other hand, this argument isn’t very good because one could make up for it by lumping all the classical soft sciences together into one area, and one would still have this problem. So overall, your point seems valid in regards to psychology.
I didn’t have in mind just psychology; I was responding to your comment about soft and wannabe-hard fields in general. In particular, this struck me as unwarranted optimism:
That is true if these sciences are nowadays overwhelmingly based on sound math and statistics, and these bad stats papers are just occasional exceptions. The pessimistic scenario I have in mind is the emergence of bogus fields in which bad formalism is the standard—i.e., in which verbal bad reasoning of the sort seen in, say, old-school Freudianism is replaced by standardized templates of bad formalism. (These are most often, but not always, in the form of bad statistics.)
This, in my opinion, results in an even worse situation. Instead of bad verbal reasoning, which can be criticized convincingly in a straightforward way, as an outside critic you’re now faced with an abstruse bad formalism. This not only makes it more difficult to spot the holes in the logic, but even if you identify them correctly, the “experts” can sneer at you and dismiss you as a crackpot, which will sound convincing to people who have’t taken the effort to work through the bad formalism themselves.
Unless you believe that such bogus fields don’t exist (and I think many examples are fairly obvious), they are clear counterexamples to your above remark. Their “mathematization” has resulted in bullshit being produced in even greater quantities, and shielded against criticism far more strongly that if they were still limited to verbal sophistry.
Another important point, which I think you’re missing, concerns your comment about problematic fields having a relatively small, and arguably less important scope relative to the (mostly) healthy hard fields. The trouble is, the output of some of the most problematic fields is used to direct the decisions and actions of the government and other powerful institutions. From miscarriages of justice due to pseudoscience used in courts to catastrophic economic crises, all kinds of calamities can directly follow from this.
No substantial disagreement with most of your comment. I will just note that most of your points (which do show that I was being overly optimistic) don’t as a whole substantially undermine the basic point being made about Eliezer’s claim.
I think your point about small fields being able to do damage is an interesting one (and one I’ve never seen before) and raises all sorts of issues that I’ll need to think about.
(...)
Have these results been replicated? Are you sure they’re correct? Merely citing cool-looking results isn’t evidence that the scientific process is working.
Remember, “the scientific process not working” doesn’t look like “cool results stop showing up”, but looks like “cool results keeping showing up except they no longer correspond to reality”. If you have no independent way of verifying the results in question, it’s hard to tell the above scenarios apart.
Bertsch and Pesta’s work has been replicated. The dinosaur temperature estimate is close to estimates made by other techniques—the main interesting thing here is that this is a direct estimate made using the fossil remains rather than working off of metabolic knowledge, body size, and the like. So the dinosaur temperature estimate is in some sense the replication by another technique of strongly suspected results. The snail result is very new; I’m not aware of anything that replicates it.
Hearing one probably bad thing and deciding we fell from grace and should shake our heads in bitter nostalgia? That’s what the villains do. We, the ingroup of Truth and Right and Apple Pie, are dutifully skeptical of claims of increasing stupidity. You taught me that.
Even if true that wouldn’t be simple.
What do you mean by that?
To the point responses = encouraging to me that your busy :)