If it helps, I think this is an example of a problem where they give different answers to the same problem. From Jaynes; see http://bayes.wustl.edu/etj/articles/confidence.pdf , page 22 for the details, and please let me know if I’ve erred or misinterpreted the example.
Three identical components. You run them through a reliability test and they fail at times 12, 14, and 16 hours. You know that these components fail in a particular way: they last at least X hours, then have a lifetime that you assess as an exponential distribution with an average of 1 hour. What is the shortest 90% confidence interval / probability interval for X, the time of guaranteed safe operation?
Bayesian 90% probability interval: 11.2 hours − 12.0 hours
Note: the frequentist interval has the strange property that we know for sure that the 90% confidence interval does not contain X (from the data we know that X ⇐ 12). The Bayesian interval seems to match our common sense better.
Heh, that’s a cheeky example. To explain why it’s cheeky, I have to briefly run through it, which I’ll do here (using Jaynes’s symbols so whoever clicked through and has pages 22-24 open can directly compare my summary with Jaynes’s exposition).
Call N the sample size and θ the minimum possible widget lifetime (what bill calls X). Jaynes first builds a frequentist confidence interval around θ by defining the unbiased estimator θ∗, which is the observations’ mean minus one. (Subtracting one accounts for the sample mean being >θ.) θ∗’s probability distribution turns out to be y^(N-1) exp(-Ny), where y = θ∗ - θ + 1. Note that y is essentially a measure of how far our estimator θ∗ is from the true θ, so Jaynes now has a pdf for that. Jaynes integrates that pdf to get y’s cdf, which he calls F(y). He then makes the 90% CI by computing [y1, y2] such that F(y2) - F(y1) = 0.9. That gives [0.1736, 1.8259]. Substituting in N and θ∗ for the sample and a little algebra (to get a CI corresponding to θ∗ rather than y) gives his θ CI of [12.1471, 13.8264].
For the Bayesian CI, Jaynes takes a constant prior, then jumps straight to the posterior being N exp(N(θ - x1)), where x1′s the smallest lifetime in the sample (12 in this case). He then comes up with the smallest interval that encompasses 90% of the posterior probability, and it turns out to be [11.23, 12].
Jaynes rightly observes that the Bayesian CI accords with common sense, and the frequentist CI does not. This comparison is what feels cheeky to me.
Why? Because Jaynes has used different estimators for the two methods [edit: I had previously written here that Jaynes implicitly used different estimators, but this is actually false; when he discusses the example subsequently (see p. 25 of the PDF) he fleshes out this point in terms of sufficient v. non-sufficient statistics.]. For the Bayesian CI, Jaynes effectively uses the minimum lifetime as his estimator for θ (by defining the likelihood to be solely a function of the smallest observation, instead of all of them), but for the frequentist CI, he explicitly uses the mean lifetime minus 1. If Jaynes-as-frequentist had happened to use the maximum likelihood estimator—which turns out to be the minimum lifetime here—instead of an arbitrary unbiased estimator he would’ve gotten precisely the same result as Jaynes-as-Bayesian.
So it seems to me that the exercise just demonstrates that Bayesianism-done-slyly outperformed frequentism-done-mindlessly. I can imagine that if I had tried to do the same exercise from scratch, I would have ended up faux-proving the reverse: that the Bayesian CI was dumber than the frequentist’s. I would’ve just picked up a boring, old-fashioned, not especially Bayesian reference book to look up the MLE, and used its sampling distribution to get my frequentist CI: that would’ve given me the common sense CI [11.23, 12]. Then I’d construct the Bayesian CI by mechanically defining the likelihood as the product of the individual observations’ likelihoods. That last step, I am pretty sure but cannot immediately prove, would give me a crappy Bayesian CI like [12.1471, 13.8264], if not that very interval.
Ultimately, at least in this case, I reckon your choice of estimator is far more important than whether you have a portrait of Bayes or Neyman on your wall.
[Edited to replace my asterisks with ∗ so I don’t mess up the formatting.]
So it seems to me that the exercise just demonstrates that Bayesianism-done-slyly outperformed frequentism-done-mindlessly.
This example really is Bayesianism-done-straightforwardly. The point is that you really don’t need to be sly to get reasonable results.
For the Bayesian CI, Jaynes takes a constant prior, then jumps straight to the posterior being N exp(N(θ - x1))
A constant prior ends up using only the likelihoods. The jump straight to the posterior is a completely mechanical calculation, just products, and normalization.
Then I’d construct the Bayesian CI by mechanically defining the likelihood as the product of the individual observations’ likelihoods.
Each individual likelihood goes to zero for (x < θ). This means that product also does for the smallest (x1 < θ). You will get out the same PDF as Jaynes. CIs can be constructed many ways from PDFs, but constructing the smallest one will give you the same one as Jaynes.
EDIT: for full effect, please do the calculation yourself.
Jaynes does go on to discuss everything you have pointed out here. He noted that confidence intervals had commonly been held not to require sufficient statistics, pointed out that some frequentist statisticians had been doubtful on that point, and remarked that if the frequentist estimator had been the sufficient statistic (the minimum lifetime) then the results would have agreed. I think the real point of the story is that he ran through the frequentist calculation for a group of people who did this sort of thing for a living and shocked them with it.
You got me: I didn’t read the what-went-wrong subsection that follows the example. (In my defence, I did start reading it, but rolled my eyes and stopped when I got to the claim that “there must be a very basic fallacy in the reasoning underlying the principle of confidence intervals”.)
I suspect I’m not the only one, though, so hopefully my explanation will catch some of the eyeballs that didn’t read Jaynes’s own post-mortem.
[Edit to add: you’re almost certainly right about the real point of the story, but I think my reply was fair given the spirit in which it was presented here, i.e. as a frequentism-v.-Bayesian thing rather than an orthodox-statisticians-are-taught-badly thing.]
Independently reproducing Jaynes’s analysis is excellent, but calling him “cheeky” for “implicitly us[ing] different estimators” is not fair given that he’s explicit on this point.
....given the spirit in which it was presented here, i.e. as a frequentism-v.-Bayesian thing rather than an orthodox-statisticians-are-taught-badly thing.
It’s a frequentism-v.-Bayesian thing to the extent that correct coverage is considered a sufficient condition for good frequentist statistical inference. This is the fallacy that you rolled your eyes at; the room full of shocked frequentists shows that it wasn’t a strawman at the time. [ETA: This isn’t quite right. The “v.-Bayesian” part comes in when correct coverage is considered a necessary condition, not a sufficient condition.]
ETA:
I suspect I’m not the only one, though, so hopefully my explanation will catch some of the eyeballs that didn’t read Jaynes’s own post-mortem.
This is a really good point, and it makes me happy that you wrote your explanation. For people for whom Jaynes’s phrasing gets in the way, your phrasing bypasses the polemics and lets them see the math behind the example.
Independently reproducing Jaynes’s analysis is excellent, but calling him “cheeky” for “implicitly us[ing] different estimators” is not fair given that he’s explicit on this point.
I was wrong to say that Jaynes implicitly used different estimators for the two methods. After the example he does mention it, a fact I missed due to skipping most of the post-mortem. I’ll edit my post higher up to fix that error. (That said, at the risk of being pedantic, I did take care to avoid calling Jaynes-the-person cheeky. I called his example cheeky, as well as his comparison of the frequentist CI to the Bayesian CI, kinda.)
It’s a frequentism-v.-Bayesian thing to the extent that correct coverage is considered a sufficient condition for good frequentist statistical inference. This is the fallacy that you rolled your eyes at; the room full of shocked frequentists shows that it wasn’t a strawman at the time. [ETA: This isn’t quite right. The “v.-Bayesian” part comes in when correct coverage is considered a necessary condition, not a sufficient condition.]
When I read Jaynes’s fallacy claim, I didn’t interpret it as saying that treating coverage as necessary/sufficient was fallacious; I read it as arguing that the use of confidence intervals in general was fallacious. That was made me roll my eyes. [Edit to clarify: that is, I was rolling my eyes at what I felt was a strawman, but a different one to the one you have in mind.] Having read his post-mortem fully and your reply, I think my initial, eye-roll-inducing interpretation was incorrect, though it was reasonable on first read-through given the context in which the “fallacy” statement appeared.
My intuition would be that the interval should be bounded above by 12 - epsilon, since the probability that we got one component that failed at the theoretically fastest rate seems unlikely (probability zero?).
If by epsilon, you mean a specific number greater than 0, the only reason to shave off an interval of length epsilon from the high end of the confidence interval is if you can get the probability contained in that epsilon-length interval back from a smaller interval attached to the low end of the confidence interval. (I haven’t work through the math, and the pdf link is giving me “404 not found”, but presumably this is not the case in this problem.)
If it helps, I think this is an example of a problem where they give different answers to the same problem. From Jaynes; see http://bayes.wustl.edu/etj/articles/confidence.pdf , page 22 for the details, and please let me know if I’ve erred or misinterpreted the example.
Three identical components. You run them through a reliability test and they fail at times 12, 14, and 16 hours. You know that these components fail in a particular way: they last at least X hours, then have a lifetime that you assess as an exponential distribution with an average of 1 hour. What is the shortest 90% confidence interval / probability interval for X, the time of guaranteed safe operation?
Frequentist 90% confidence interval: 12.1 hours − 13.8 hours
Bayesian 90% probability interval: 11.2 hours − 12.0 hours
Note: the frequentist interval has the strange property that we know for sure that the 90% confidence interval does not contain X (from the data we know that X ⇐ 12). The Bayesian interval seems to match our common sense better.
Heh, that’s a cheeky example. To explain why it’s cheeky, I have to briefly run through it, which I’ll do here (using Jaynes’s symbols so whoever clicked through and has pages 22-24 open can directly compare my summary with Jaynes’s exposition).
Call N the sample size and θ the minimum possible widget lifetime (what bill calls X). Jaynes first builds a frequentist confidence interval around θ by defining the unbiased estimator θ∗, which is the observations’ mean minus one. (Subtracting one accounts for the sample mean being >θ.) θ∗’s probability distribution turns out to be y^(N-1) exp(-Ny), where y = θ∗ - θ + 1. Note that y is essentially a measure of how far our estimator θ∗ is from the true θ, so Jaynes now has a pdf for that. Jaynes integrates that pdf to get y’s cdf, which he calls F(y). He then makes the 90% CI by computing [y1, y2] such that F(y2) - F(y1) = 0.9. That gives [0.1736, 1.8259]. Substituting in N and θ∗ for the sample and a little algebra (to get a CI corresponding to θ∗ rather than y) gives his θ CI of [12.1471, 13.8264].
For the Bayesian CI, Jaynes takes a constant prior, then jumps straight to the posterior being N exp(N(θ - x1)), where x1′s the smallest lifetime in the sample (12 in this case). He then comes up with the smallest interval that encompasses 90% of the posterior probability, and it turns out to be [11.23, 12].
Jaynes rightly observes that the Bayesian CI accords with common sense, and the frequentist CI does not. This comparison is what feels cheeky to me.
Why? Because Jaynes has used different estimators for the two methods [edit: I had previously written here that Jaynes implicitly used different estimators, but this is actually false; when he discusses the example subsequently (see p. 25 of the PDF) he fleshes out this point in terms of sufficient v. non-sufficient statistics.]. For the Bayesian CI, Jaynes effectively uses the minimum lifetime as his estimator for θ (by defining the likelihood to be solely a function of the smallest observation, instead of all of them), but for the frequentist CI, he explicitly uses the mean lifetime minus 1. If Jaynes-as-frequentist had happened to use the maximum likelihood estimator—which turns out to be the minimum lifetime here—instead of an arbitrary unbiased estimator he would’ve gotten precisely the same result as Jaynes-as-Bayesian.
So it seems to me that the exercise just demonstrates that Bayesianism-done-slyly outperformed frequentism-done-mindlessly. I can imagine that if I had tried to do the same exercise from scratch, I would have ended up faux-proving the reverse: that the Bayesian CI was dumber than the frequentist’s. I would’ve just picked up a boring, old-fashioned, not especially Bayesian reference book to look up the MLE, and used its sampling distribution to get my frequentist CI: that would’ve given me the common sense CI [11.23, 12]. Then I’d construct the Bayesian CI by mechanically defining the likelihood as the product of the individual observations’ likelihoods. That last step, I am pretty sure but cannot immediately prove, would give me a crappy Bayesian CI like [12.1471, 13.8264], if not that very interval.
Ultimately, at least in this case, I reckon your choice of estimator is far more important than whether you have a portrait of Bayes or Neyman on your wall.
[Edited to replace my asterisks with ∗ so I don’t mess up the formatting.]
This example really is Bayesianism-done-straightforwardly. The point is that you really don’t need to be sly to get reasonable results.
A constant prior ends up using only the likelihoods. The jump straight to the posterior is a completely mechanical calculation, just products, and normalization.
Each individual likelihood goes to zero for (x < θ). This means that product also does for the smallest (x1 < θ). You will get out the same PDF as Jaynes. CIs can be constructed many ways from PDFs, but constructing the smallest one will give you the same one as Jaynes.
EDIT: for full effect, please do the calculation yourself.
I stopped reading cupholder’s comment before the last paragraph (to write my own reply) and completely missed this! D’oh!
Jaynes does go on to discuss everything you have pointed out here. He noted that confidence intervals had commonly been held not to require sufficient statistics, pointed out that some frequentist statisticians had been doubtful on that point, and remarked that if the frequentist estimator had been the sufficient statistic (the minimum lifetime) then the results would have agreed. I think the real point of the story is that he ran through the frequentist calculation for a group of people who did this sort of thing for a living and shocked them with it.
You got me: I didn’t read the what-went-wrong subsection that follows the example. (In my defence, I did start reading it, but rolled my eyes and stopped when I got to the claim that “there must be a very basic fallacy in the reasoning underlying the principle of confidence intervals”.)
I suspect I’m not the only one, though, so hopefully my explanation will catch some of the eyeballs that didn’t read Jaynes’s own post-mortem.
[Edit to add: you’re almost certainly right about the real point of the story, but I think my reply was fair given the spirit in which it was presented here, i.e. as a frequentism-v.-Bayesian thing rather than an orthodox-statisticians-are-taught-badly thing.]
Independently reproducing Jaynes’s analysis is excellent, but calling him “cheeky” for “implicitly us[ing] different estimators” is not fair given that he’s explicit on this point.
It’s a frequentism-v.-Bayesian thing to the extent that correct coverage is considered a sufficient condition for good frequentist statistical inference. This is the fallacy that you rolled your eyes at; the room full of shocked frequentists shows that it wasn’t a strawman at the time. [ETA: This isn’t quite right. The “v.-Bayesian” part comes in when correct coverage is considered a necessary condition, not a sufficient condition.]
ETA:
This is a really good point, and it makes me happy that you wrote your explanation. For people for whom Jaynes’s phrasing gets in the way, your phrasing bypasses the polemics and lets them see the math behind the example.
I was wrong to say that Jaynes implicitly used different estimators for the two methods. After the example he does mention it, a fact I missed due to skipping most of the post-mortem. I’ll edit my post higher up to fix that error. (That said, at the risk of being pedantic, I did take care to avoid calling Jaynes-the-person cheeky. I called his example cheeky, as well as his comparison of the frequentist CI to the Bayesian CI, kinda.)
When I read Jaynes’s fallacy claim, I didn’t interpret it as saying that treating coverage as necessary/sufficient was fallacious; I read it as arguing that the use of confidence intervals in general was fallacious. That was made me roll my eyes. [Edit to clarify: that is, I was rolling my eyes at what I felt was a strawman, but a different one to the one you have in mind.] Having read his post-mortem fully and your reply, I think my initial, eye-roll-inducing interpretation was incorrect, though it was reasonable on first read-through given the context in which the “fallacy” statement appeared.
Fair point.
excellent paper, thanks for the link.
My intuition would be that the interval should be bounded above by 12 - epsilon, since the probability that we got one component that failed at the theoretically fastest rate seems unlikely (probability zero?).
You can treat the interval as open at 12.0 if you like; it makes no difference.
If by epsilon, you mean a specific number greater than 0, the only reason to shave off an interval of length epsilon from the high end of the confidence interval is if you can get the probability contained in that epsilon-length interval back from a smaller interval attached to the low end of the confidence interval. (I haven’t work through the math, and the pdf link is giving me “404 not found”, but presumably this is not the case in this problem.)
The link’s a 404 because it includes a comma by accident—here’s one that works: http://bayes.wustl.edu/etj/articles/confidence.pdf.
Thanks, that makes sense, although it still butts up closely against my intuition.