Confidence intervals (CIs) have frequently been proposed as a more useful alternative to NHST, and their use is strongly encouraged in the APA Manual. Nevertheless, little is known about how researchers interpret CIs. In this study, 120 researchers and 442 students-all in the field of psychology-were asked to assess the truth value of six particular statements involving different interpretations of a CI. Although all six statements were false, both researchers and students endorsed, on average, more than three statements, indicating a gross misunderstanding of CIs. Self-declared experience with statistics was not related to researchers’ performance, and, even more surprisingly, researchers hardly outperformed the students, even though the students had not received any education on statistical inference whatsoever. Our findings suggest that many researchers do not know the correct interpretation of a CI.
...Falk and Greenbaum (1995) found similar results in a replication of Oakes’s study, and Haller and Krauss (2002) showed that even professors and lecturers teaching statistics often endorse false statements about the results from NHST. Lecoutre, Poitevineau, and Lecoutre (2003) found the same for statisticians working for pharmaceutical companies, and Wulff and colleagues reported misunderstandings in doctors and dentists (Scheutz, Andersen, & Wulff, 1988; Wulff, Andersen, Brandenhoff, & Guttler, 1987). Hoekstra et al. (2006) showed that in more than half of a sample of published articles, a nonsignificant outcome was erroneously interpreted as proof for the absence of an effect, and in about 20% of the articles, a significant finding was considered absolute proof of the existence of an effect. In sum, p-values are often misinterpreted, even by researchers who use them on a regular basis.
Falk, R., & Greenbaum, C. W. (1995). “Significance tests die hard: The amazing persistence of a probabilistic misconception”. Theory and Psychology, 5, 75-98.
Lecoutre, M.-P., Poitevineau, J., & Lecoutre, B. (2003). “Even statisticians are not immune to misinterpretations of null hypothesis tests”. International Journal of Psychology, 38, 37–45.
Scheutz, F., Andersen, B., & Wulff, H. R. (1988). “What do dentists know about statistics?” Scandinavian Journal of Dental Research, 96, 281–287
Wulff, H. R., Andersen, B., Brandenhoff, P., & Guttler, F. (1987). “What do doctors know about statistics?” Statistics in Medicine, 6, 3–10
...Our sample consisted of 442 bachelor students, 34 master students, and 120 researchers (i.e., PhD students and faculty). The bachelor students were first-year psychology students attending an introductory statistics class at the University of Amsterdam. These students had not yet taken any class on inferential statistics as part of their studies. The master students were completing a degree in psychology at the University of Amsterdam and, as such, had received a substantial amount of education on statistical inference in the previous 3 years. The researchers came from the universities of Groningen (n = 49), Amsterdam (n = 44), and Tilburg (n = 27).
...The questionnaire featured six statements, all of which were incorrect. This design choice was inspired by the p-value questionnaire from Gigerenzer (2004). Researchers who are aware of the correct interpretation of a CI should have no difficulty checking all “false” boxes. The (incorrect) statements are the following:
“The probability that the true mean is greater than 0 is at least 95%.”
“The probability that the true mean equals 0 is smaller than 5%.”
“The ‘null hypothesis’ that the true mean equals 0 is likely to be incorrect.”
“There is a 95% probability that the true mean lies between 0.1 and 0.4.”
“We can be 95% confident that the true mean lies between 0.1 and 0.4.”
“If we were to repeat the experiment over and over, then 95% of the time the true mean falls between 0.1 and 0.4.”
Statements 1, 2, 3, and 4 assign probabilities to parameters or hypotheses, something that is not allowed within the frequentist framework. Statements 5 and 6 mention the boundaries of the CI (i.e., 0.1 and 0.4), whereas, as was stated above, a CI can be used to evaluate only the procedure and not a specific interval. The correct statement, which was absent from the list, is the following: “If we were to repeat the experiment over and over, then 95% of the time the confidence intervals contain the true mean.”
...The mean numbers of items endorsed for first-year students, master students, and researchers were 3.51 (99% CI = [3.35, 3.68]), 3.24 (99% CI = [2.40, 4.07]), and 3.45 (99% CI = [3.08, 3.82]), respectively. The item endorsement proportions are presented per group in Fig. 1. Notably, despite the first-year students’ complete lack of education on statistical inference, they clearly do not form an outlying group...Indeed, the correlation between endorsed items and experience was even slightly positive (0.04; 99% CI = [−0.20; 0.27]), contrary to what one would expect if experience decreased the number of misinterpretations.
“Robust misinterpretation of confidence intervals”, Hoekstra et al 2014