The NIH explanation is terrible, which is not surprising. The concept of a confidence interval is one of the hardest topics to explain and to learn.
Firstly and most importantly, any confidence interval depends upon a model. If the model itself is incorrect, the interval means nothing. The parameter itself may also mean nothing, depending upon how badly the model fails.
You address much of the rest in your post, in particular their butchering of what a confidence interval means.
In practice, methods for assigning confidence intervals to data are usually monotonic, so that an increase in the true value for q monotonically shifts the distribution for the interval endpoints. This isn’t a necessary property for confidence intervals, but it’s a useful one and by design holds very often. This behaviour rules out the pathological behaviour in the post: if you get a lower endpoint of at least 0.9 for some q in [0.4, 0.6] with some probability p, then for all q > 0.6 the probability is not less than p. That is, with any such function when the population really does have a strong preference toward sandwiches, then your probability of detecting it will always be greater than for weaker or opposite preferences.
So yes, if you choose a bad confidence interval function then you can get horrible results.
The NIH explanation is terrible, which is not surprising. The concept of a confidence interval is one of the hardest topics to explain and to learn.
Firstly and most importantly, any confidence interval depends upon a model. If the model itself is incorrect, the interval means nothing. The parameter itself may also mean nothing, depending upon how badly the model fails.
You address much of the rest in your post, in particular their butchering of what a confidence interval means.
In practice, methods for assigning confidence intervals to data are usually monotonic, so that an increase in the true value for q monotonically shifts the distribution for the interval endpoints. This isn’t a necessary property for confidence intervals, but it’s a useful one and by design holds very often. This behaviour rules out the pathological behaviour in the post: if you get a lower endpoint of at least 0.9 for some q in [0.4, 0.6] with some probability p, then for all q > 0.6 the probability is not less than p. That is, with any such function when the population really does have a strong preference toward sandwiches, then your probability of detecting it will always be greater than for weaker or opposite preferences.
So yes, if you choose a bad confidence interval function then you can get horrible results.