Now, if that is a fair summary, then this big controversy between frequentists and Bayesians must mean that there is a sizable collection of people who think that the above procedure is a better way of obtaining knowledge than performing Bayesian updates.
Not necessarily better. Just more convenient for the thumbs up/thumbs down way of looking at evidence that scientists tend to like.
But for the life of me, I can’t see how anyone could possibly think that. I mean, not only is the “p-value” threshold arbitrary,
It’s a convention. The point is to have a pre-agreed, low significance level so that testers can’t screw with the result of a test by arbitrary jacking the significance level up (if they want to reject a hypothesis) or turning it down (if they don’t). The significance level has to be low to minimize the risk of a type I error.
not only are we depriving ourselves of valuable information by “accepting” or “not accepting” a hypothesis rather than quantifying our certainty level,
The certainty level is effectively communicated via the significance level and p-value itself. (And the use of a reject vs. don’t reject dichotomy can be desirable if one wishes to decide between performing some action and not performing it based on some data.)
but...what about P(E|H)?? (Not to mention P(H).) To me, it seems blatantly obvious that an epistemology (and that’s what it is) like the above is a recipe for disaster—specifically in the form of accumulated errors over time.
A frequentist can deal in likelihoods, for example by doing hypothesis tests of likelihood ratios. As for priors, a frequentist encapsulates them in parametric and sampling assumptions about the data. A Bayesian might give a low weight to a positive result from a parapsychology study because of their “low priors”, but a frequentist might complain about sampling procedures or cherrypicking being more likely than a true positive. As I see it, the two say essentially the same thing; the frequentist is just being more specific than the Bayesian.
The certainty level is effectively communicated via the significance level and p-value itself.
No. P-values are not equivalent when they are calculated using different statistics, or even the same statistic but a different sample size. On the latter point see Royall, 1986.
As I see it, the two say essentially the same thing; the frequentist is just being more specific than the Bayesian.
I’d say the frequentist is using Bayesian reasoning informally; Jaynes discusses this exact problem from a Bayesian perspective at the beginning of Chapter 5 of his magnum opus.
No. P-values are not equivalent when they are calculated using different statistics, or even the same statistic but a different sample size. On the latter point see Royall, 1986.
Sorry. You are quite right, and I was sloppy. I had in mind the implicit idea that holding the choices of statistical test and data collection procedure constant, different p-values suggest how strongly one should reject the null hypothesis, and I should have made that explicit. It is absolutely true that if I just ask someone, “Test A gave me p = 0.008 and Test B gave me p = 0.4, which test’s null hypothesis is worse off?”, the correct answer is “how should I know?”
I’d say the frequentist is using Bayesian reasoning informally; Jaynes discusses this exact problem from a Bayesian perspective at the beginning of Chapter 5 of his magnum opus.
Yep. I think this is an example of the frequentist encapsulating what a Bayesian would call priors in their sampling assumptions.
Not necessarily better. Just more convenient for the thumbs up/thumbs down way of looking at evidence that scientists tend to like.
It’s a convention. The point is to have a pre-agreed, low significance level so that testers can’t screw with the result of a test by arbitrary jacking the significance level up (if they want to reject a hypothesis) or turning it down (if they don’t). The significance level has to be low to minimize the risk of a type I error.
The certainty level is effectively communicated via the significance level and p-value itself. (And the use of a reject vs. don’t reject dichotomy can be desirable if one wishes to decide between performing some action and not performing it based on some data.)
A frequentist can deal in likelihoods, for example by doing hypothesis tests of likelihood ratios. As for priors, a frequentist encapsulates them in parametric and sampling assumptions about the data. A Bayesian might give a low weight to a positive result from a parapsychology study because of their “low priors”, but a frequentist might complain about sampling procedures or cherrypicking being more likely than a true positive. As I see it, the two say essentially the same thing; the frequentist is just being more specific than the Bayesian.
No. P-values are not equivalent when they are calculated using different statistics, or even the same statistic but a different sample size. On the latter point see Royall, 1986.
I’d say the frequentist is using Bayesian reasoning informally; Jaynes discusses this exact problem from a Bayesian perspective at the beginning of Chapter 5 of his magnum opus.
Sorry. You are quite right, and I was sloppy. I had in mind the implicit idea that holding the choices of statistical test and data collection procedure constant, different p-values suggest how strongly one should reject the null hypothesis, and I should have made that explicit. It is absolutely true that if I just ask someone, “Test A gave me p = 0.008 and Test B gave me p = 0.4, which test’s null hypothesis is worse off?”, the correct answer is “how should I know?”
Yep. I think this is an example of the frequentist encapsulating what a Bayesian would call priors in their sampling assumptions.