A Bayesian will have a probability distribution over possible outcomes, some of which give her lower scores than her probabilistic expectation of average score, and some of which give her higher scores than this expectation.
I am unable to parse your above claim, and ask for specific math on a specific example. If you know your score will be lower than you expect, you should lower your expectation. If you know something will happen less often than the probability you assign, you should assign a lower probability. This sounds like an inconsistent epistemic state for a Bayesian to be in.
I spent some time looking up papers, trying to find accessible ones. The main paper that kicked off the matching prior program is Welch and Peers, 1963, but you need access to JSTOR.
The best I can offer is the following example. I am estimating a large number of positive estimands. I have one noisy observation for each one; the noise is Gaussian with standard deviation equal to one. I have no information relating the estimands; per Jaynes, I give them independent priors, resulting in independent posteriors*. I do not have information justifying a proper prior. Let’s say I use a flat prior over the positive real line. No matter the true value of each estimand, the sampling probability of the event “my posterior 90% quantile is greater than the estimand” is less than 0.9 (see Figure 6 of this working paper by D.A.S. Fraser). So the more estimands I analyze, the more sure I am that the intervals from 0 to my posterior 90% quantiles will contain less than 90% of the estimands.
I don’t know if there’s an exact matching prior in this problem, but I suspect it lacks the correct structure.
* This is a place I think Jaynes goes wrong: the quantities are best modeled as exchangeable, not independent. Equivalently, I put them in a hierarchical model. But this only kicks the problem of priors guaranteeing calibration up a level.
I’m sorry, but the level of frequentist gibberish in this paper is larger than I would really like to work through.
If you could be so kind, please state:
What the Bayesian is using as a prior and likelihood function;
and what distribution the paper assumes the actual parameters are being drawn from, and what the real causal process is governing the appearance of evidence.
If the two don’t match, then of course the Bayesian posterior distributions, relative to the experimenter’s higher knowledge, can appear poorly calibrated.
If the two do match, then the Bayesian should be well-calibrated. Sure looks QED-ish to me.
The example doesn’t come from the paper; I made it myself. You only need to believe the figure I cited—don’t bother with the rest of the paper.
Call the estimands mu_1 to mu_n; the data are x_1 to x_n. The prior over the mu parameters is flat in the positive subset of R^n, zero elsewhere. The sampling distribution for x_i is Normal(mu_i,1). I don’t know the distribution the parameters actually follow. The causal process is irrelevant—I’ll stipulate that the sampling distribution is known exactly.
Call the 90% quantiles of my posterior distributions q_i. From the sampling perspective, these are random quantities, being monotonic functions of the data. Their sampling distributions satisfy the inequality Pr(q_i > mu_i | mu_i) < 0.9. (This is what the figure I cited shows.) As n goes to infinity, I become more and more sure that my posterior intervals of the form (0, q_i] are undercalibrated.
You might cite the improper prior as the source of the problem. However, if the parameter space were unrestricted and the prior flat over all of R^n, the posterior intervals would by correctly calibrated.
But it really is fair to demand a proper prior. How could we determine that prior? Only by Bayesian updating from some pre-prior state of information to the prior state of information (or equivalently, by logical deduction, provided that the knowledge we update on is certain). Right away we run into the problem that Bayesian updating does not have calibration guarantees in general (and for this, you really ought to read the literature), so it’s likely that any proper prior we might justify does not have a calibration guarantee.
A Bayesian will have a probability distribution over possible outcomes, some of which give her lower scores than her probabilistic expectation of average score, and some of which give her higher scores than this expectation.
I am unable to parse your above claim, and ask for specific math on a specific example. If you know your score will be lower than you expect, you should lower your expectation. If you know something will happen less often than the probability you assign, you should assign a lower probability. This sounds like an inconsistent epistemic state for a Bayesian to be in.
I spent some time looking up papers, trying to find accessible ones. The main paper that kicked off the matching prior program is Welch and Peers, 1963, but you need access to JSTOR.
The best I can offer is the following example. I am estimating a large number of positive estimands. I have one noisy observation for each one; the noise is Gaussian with standard deviation equal to one. I have no information relating the estimands; per Jaynes, I give them independent priors, resulting in independent posteriors*. I do not have information justifying a proper prior. Let’s say I use a flat prior over the positive real line. No matter the true value of each estimand, the sampling probability of the event “my posterior 90% quantile is greater than the estimand” is less than 0.9 (see Figure 6 of this working paper by D.A.S. Fraser). So the more estimands I analyze, the more sure I am that the intervals from 0 to my posterior 90% quantiles will contain less than 90% of the estimands.
I don’t know if there’s an exact matching prior in this problem, but I suspect it lacks the correct structure.
* This is a place I think Jaynes goes wrong: the quantities are best modeled as exchangeable, not independent. Equivalently, I put them in a hierarchical model. But this only kicks the problem of priors guaranteeing calibration up a level.
I’m sorry, but the level of frequentist gibberish in this paper is larger than I would really like to work through.
If you could be so kind, please state:
What the Bayesian is using as a prior and likelihood function;
and what distribution the paper assumes the actual parameters are being drawn from, and what the real causal process is governing the appearance of evidence.
If the two don’t match, then of course the Bayesian posterior distributions, relative to the experimenter’s higher knowledge, can appear poorly calibrated.
If the two do match, then the Bayesian should be well-calibrated. Sure looks QED-ish to me.
The example doesn’t come from the paper; I made it myself. You only need to believe the figure I cited—don’t bother with the rest of the paper.
Call the estimands mu_1 to mu_n; the data are x_1 to x_n. The prior over the mu parameters is flat in the positive subset of R^n, zero elsewhere. The sampling distribution for x_i is Normal(mu_i,1). I don’t know the distribution the parameters actually follow. The causal process is irrelevant—I’ll stipulate that the sampling distribution is known exactly.
Call the 90% quantiles of my posterior distributions q_i. From the sampling perspective, these are random quantities, being monotonic functions of the data. Their sampling distributions satisfy the inequality Pr(q_i > mu_i | mu_i) < 0.9. (This is what the figure I cited shows.) As n goes to infinity, I become more and more sure that my posterior intervals of the form (0, q_i] are undercalibrated.
You might cite the improper prior as the source of the problem. However, if the parameter space were unrestricted and the prior flat over all of R^n, the posterior intervals would by correctly calibrated.
But it really is fair to demand a proper prior. How could we determine that prior? Only by Bayesian updating from some pre-prior state of information to the prior state of information (or equivalently, by logical deduction, provided that the knowledge we update on is certain). Right away we run into the problem that Bayesian updating does not have calibration guarantees in general (and for this, you really ought to read the literature), so it’s likely that any proper prior we might justify does not have a calibration guarantee.