I am reminded of a paper by Simkin and Roychowdhury where they argued, on the basis of an analysis of misprints in scientific paper citations, that most scientists don’t actually read the papers they cite, but instead just copy the citations from other papers. From this they show that the fact that some papers are widely cited in the literature can be explained by random chance alone.
Their evidence is not without flaws—the scientists might have just copied the citations for convenience, despite having actually read the papers. Still, we can easily imagine a similar effect arising if the scientists do read the papers they cite, but use the citation lists in other papers to direct their own reading. In that case, a paper that is read and cited once is more likely to be read and cited again, so a small number of papers acquire an unusual prominence independent of their inherent worth.
If we see a significant number of instances where the conclusions of a widely-accepted paper are later debunked by a simple test, then we might begin to suspect that something like this is happening.
If we see a significant number of instances where the conclusions of a widely-accepted paper are later debunked by a simple test, then we might begin to suspect that something like this is happening.
How so? Could you clarify your reasoning?
Scientists cite journals with conclusions that are convenient to cite (either because they corroborate their views or define a position to pivot from or argue with) whether or not they have been read. Journals with easily debunked conclusions might equivalently be not read (and thus unexamined) or read (and simply trusted).
I think that the real test for whether cited publications are read or not is the following: if a publication is consistently cited for a conclusion it does not actually present, then this is evidence of no one actually having read the publication.
I recall in my research that it was very convenient in the literature to cite one particular publication for a minor but foundational tenet in the field. However, when I finally got a hard-copy of the paper I couldn’t find this idea explicitly written anywhere. The thing is—contradicting what I say above, unfortunately—I think the paper was well-read, but people don’t double-check citations if the citation seems reasonable.
My thinking is: Given that a scientist has read (or looked at) a paper, they’re more likely to cite it if it’s correct and useful than if it’s incorrect. (I’m assuming that affirmative citations are more common than “X & Y said Z but they’re wrong because...” citations.) If that were all that happened, then the number of citations a paper gets would be strongly correlated with its correctness, and we would expect it to be rare for a bad paper to get a lot of citations. However, if we take into account the fact that citations are also used by other scientists as a reading list, then a paper that has already been cited a lot will be read by a lot of people, of whom some will cite it.
So when a paper is published, there are two forces affecting the number of citations it gets. First, the “badness effect” (“This paper sounds iffy, so I won’t cite it”) pushes down the number of citations. Second, the “popularity effect” (a lot of people have read the paper, so a lot of people are potential citers) pushes up the number of citations. The magnitude of the popularity effect depends mostly on what happens soon after publication, when readership is small and thus more subject to random variation. Of course, for blatantly erroneous papers the badness effect will still predominate, but in marginal cases (like the aphasia example) the popularity effect can swamp the badness effect. Hence we would expect to see more bad papers getting widely cited; the more obviously bad they are, the stronger this suggests the popularity effect is.
I suppose one could create a computer simulation if one were interested; I would predict results similar to Simkin & Roychowdhury’s.
I see: in the case that a paper is read, deciding a paper sounds iffy and deciding not to cite it would correlate strongly with deciding not to cite a paper with wrong conclusions.
I was considering that scientists rarely check the conclusions of the papers they cite by reading them, but just decide based on writing and other signals whether the source is credible. So a well-written paper with a wrong conclusion could get continued citations. But indeed, if the paper is written carefully and the methodology convincing, it would be less likely that the conclusion is wrong.
That’s great! I’ve wondered why so many mathematical papers (in non-math subject areas) contain misprints and omissions that make their equations uninterpretable. I’m wondering if even the referees and editors read them.
And I have a confession. I didn’t read all of the papers I referenced!
Still, we can easily imagine a similar effect arising if the scientists do read the papers they cite, but use the citation lists in other papers to direct their own reading.
Indeed this is commonplace for all academic fields, though I don’t see the problem with it, so long as the effect doesn’t squash new work.
I am reminded of a paper by Simkin and Roychowdhury where they argued, on the basis of an analysis of misprints in scientific paper citations, that most scientists don’t actually read the papers they cite, but instead just copy the citations from other papers. From this they show that the fact that some papers are widely cited in the literature can be explained by random chance alone.
Their evidence is not without flaws—the scientists might have just copied the citations for convenience, despite having actually read the papers. Still, we can easily imagine a similar effect arising if the scientists do read the papers they cite, but use the citation lists in other papers to direct their own reading. In that case, a paper that is read and cited once is more likely to be read and cited again, so a small number of papers acquire an unusual prominence independent of their inherent worth.
If we see a significant number of instances where the conclusions of a widely-accepted paper are later debunked by a simple test, then we might begin to suspect that something like this is happening.
I copy citations from other papers. When I can, I copy and paste BibTeX stanzas I find on the Web.
How so? Could you clarify your reasoning?
Scientists cite journals with conclusions that are convenient to cite (either because they corroborate their views or define a position to pivot from or argue with) whether or not they have been read. Journals with easily debunked conclusions might equivalently be not read (and thus unexamined) or read (and simply trusted).
I think that the real test for whether cited publications are read or not is the following: if a publication is consistently cited for a conclusion it does not actually present, then this is evidence of no one actually having read the publication.
I recall in my research that it was very convenient in the literature to cite one particular publication for a minor but foundational tenet in the field. However, when I finally got a hard-copy of the paper I couldn’t find this idea explicitly written anywhere. The thing is—contradicting what I say above, unfortunately—I think the paper was well-read, but people don’t double-check citations if the citation seems reasonable.
My thinking is: Given that a scientist has read (or looked at) a paper, they’re more likely to cite it if it’s correct and useful than if it’s incorrect. (I’m assuming that affirmative citations are more common than “X & Y said Z but they’re wrong because...” citations.) If that were all that happened, then the number of citations a paper gets would be strongly correlated with its correctness, and we would expect it to be rare for a bad paper to get a lot of citations. However, if we take into account the fact that citations are also used by other scientists as a reading list, then a paper that has already been cited a lot will be read by a lot of people, of whom some will cite it.
So when a paper is published, there are two forces affecting the number of citations it gets. First, the “badness effect” (“This paper sounds iffy, so I won’t cite it”) pushes down the number of citations. Second, the “popularity effect” (a lot of people have read the paper, so a lot of people are potential citers) pushes up the number of citations. The magnitude of the popularity effect depends mostly on what happens soon after publication, when readership is small and thus more subject to random variation. Of course, for blatantly erroneous papers the badness effect will still predominate, but in marginal cases (like the aphasia example) the popularity effect can swamp the badness effect. Hence we would expect to see more bad papers getting widely cited; the more obviously bad they are, the stronger this suggests the popularity effect is.
I suppose one could create a computer simulation if one were interested; I would predict results similar to Simkin & Roychowdhury’s.
I see: in the case that a paper is read, deciding a paper sounds iffy and deciding not to cite it would correlate strongly with deciding not to cite a paper with wrong conclusions.
I was considering that scientists rarely check the conclusions of the papers they cite by reading them, but just decide based on writing and other signals whether the source is credible. So a well-written paper with a wrong conclusion could get continued citations. But indeed, if the paper is written carefully and the methodology convincing, it would be less likely that the conclusion is wrong.
That’s great! I’ve wondered why so many mathematical papers (in non-math subject areas) contain misprints and omissions that make their equations uninterpretable. I’m wondering if even the referees and editors read them.
And I have a confession. I didn’t read all of the papers I referenced!
Indeed this is commonplace for all academic fields, though I don’t see the problem with it, so long as the effect doesn’t squash new work.
a question
if referencing is not based on knowledge or perhaps even relevance what does this imply for Google algorithm?
does it not organize search responses according to page links?