If a study finds a result significant at a p=0.05 level, that means they have followed a methodology which produces this conclusion correctly 95 % of the time.
This smells like a confidence interval; is that what you had in mind? I think the default definition of p-values is more like “this means that their methodology would produce this conclusion at most 5% of the time if the conclusion was actually false.”
(rest of the article still holds; if their chance of drawing the wrong conclusion from their data is 15%, the claimed p-value becomes completely useless until their methodology is checked)
I’m here to say, this is not some property specific to p-values, just about the credibility of the communicator.
If scientists make a bunch of errors all the time, especially those that change their conclusions, indeed you can’t trust them. Turns out (BW11) that scientistspublishedinbetterjournals are more credible than scientistspublishedinworsejournals, the errors they make tend not to change the conclusions of the test (i.e., the chance of drawing a wrong conclusion from their data (“gross error” in BW11) was much lower than the headline rate), and (admittedly I’m going out on a limb here) it is very possible the errors that change the conclusion of a particular test do not change the overall conclusion about the general theory (e.g., if theory says X, Y, and Z should happen, and you find support for X and Y and marginal-support-now-not-significant-support-anymore for Z, the theory is still pretty intact unless you really care about using p-values in a binary fashion. If theory says X, Y, and Z should happen, and you find support for X and Y and now-not-significant-support-anymore for Z, that’s more of an issue. But given how many tests are in a paper, it’s also possible theory says X, Y, and Z should happen, and you find support for X and Y and Z, but turns out your conclusion about W reverses, which may or may not really have something to say about your theory).
I don’t think it is wise to throw the baby out with the bathwater.
This smells like a confidence interval; is that what you had in mind? I think the default definition of p-values is more like “this means that their methodology would produce this conclusion at most 5% of the time if the conclusion was actually false.”
(rest of the article still holds; if their chance of drawing the wrong conclusion from their data is 15%, the claimed p-value becomes completely useless until their methodology is checked)
I’m here to say, this is not some property specific to p-values, just about the credibility of the communicator.
If scientists make a bunch of errors all the time, especially those that change their conclusions, indeed you can’t trust them. Turns out (BW11) that scientistspublishedinbetterjournals are more credible than scientistspublishedinworsejournals, the errors they make tend not to change the conclusions of the test (i.e., the chance of drawing a wrong conclusion from their data (“gross error” in BW11) was much lower than the headline rate), and (admittedly I’m going out on a limb here) it is very possible the errors that change the conclusion of a particular test do not change the overall conclusion about the general theory (e.g., if theory says X, Y, and Z should happen, and you find support for X and Y and marginal-support-now-not-significant-support-anymore for Z, the theory is still pretty intact unless you really care about using p-values in a binary fashion. If theory says X, Y, and Z should happen, and you find support for X and Y and now-not-significant-support-anymore for Z, that’s more of an issue. But given how many tests are in a paper, it’s also possible theory says X, Y, and Z should happen, and you find support for X and Y and Z, but turns out your conclusion about W reverses, which may or may not really have something to say about your theory).
I don’t think it is wise to throw the baby out with the bathwater.