I see how this will work for a continuous distribution like the beta distribution. Visually the effect of a high number of samples will be that the curve is more sharply centered on the most probable part of the curve. The outlier cases are more quickly becoming improbable as we move outwards.
But then this must mean that the discrete, “perfect”, “infinite-sample” likelihood distribution used in the Wikipedia example must have a very high influence on the posterior, almost marginalising the effect if the prior. Do I reason correctly here?
And does this “infinite-sample” likelihood distribution really have such a strong effect in the Wikipedia example? (I don’t know how to judge this)
I suspect we should make clear two points under discussion: first, the rate of defective material that a machine spits out, and second, there is the question of how much knowing that material is defective tells us about what machine processed it.
satt’s comment handles the second point; when we are trying to estimate which machine produced a single defective product, the sample size of products is, by necessity, one. (Because we’ve implicitly assumed that the defectivity of products is independent, sampling more of them isn’t really any more interesting than sampling one of them.)
But in order to do that calculation, we need some information about how much defective product each machine produces. As it turns out, we only need the first moment (i.e. mean) of that estimate; higher moments (like the variance) don’t show up in the calculation. (Is it clear to you how to verify that statement?) So a 5% chance that I’m absolutely certain of and a 5% chance that comes from a guess lead to the same final output.
And does this “infinite-sample” likelihood distribution really have such a strong effect in the Wikipedia example? (I don’t know how to judge this)
For many probabilistic calculations, it’s helpful to do a sensitivity analysis. That is, we jiggle around the inputs we gave (like the percentage of the total output that each machine produces, or the defectivity rate of each machine, and so on) to determine how strongly they influence the outcome of the procedure. If we were just guessing with the 5% number, but we discover that dropping it to 4% makes a huge difference, then maybe we should go back and refine our estimate to be sure that it’s 5% instead of 4%. If the number is roughly the same, then our estimate is probably good enough.
If only the mean if the likelihood distribution is involved, not the variance, then truly the sample size used when creating the likelihood distribution has no influence on the Bayesian update.
Then the next question is: is it a problem?
If I understand you correctly then your answer is: “not really, because ”.
Then it’s only the part I don’t get.
You ask me if it’s clear to me why only the mean if the likelihood distribution is involved in the Bayesian update. Well, it isn’t currently, but I’ll read the article “Continuous Bayes” and see if it then becomes more clear to me:
I see how this will work for a continuous distribution like the beta distribution. Visually the effect of a high number of samples will be that the curve is more sharply centered on the most probable part of the curve. The outlier cases are more quickly becoming improbable as we move outwards.
But then this must mean that the discrete, “perfect”, “infinite-sample” likelihood distribution used in the Wikipedia example must have a very high influence on the posterior, almost marginalising the effect if the prior. Do I reason correctly here?
And does this “infinite-sample” likelihood distribution really have such a strong effect in the Wikipedia example? (I don’t know how to judge this)
I suspect we should make clear two points under discussion: first, the rate of defective material that a machine spits out, and second, there is the question of how much knowing that material is defective tells us about what machine processed it.
satt’s comment handles the second point; when we are trying to estimate which machine produced a single defective product, the sample size of products is, by necessity, one. (Because we’ve implicitly assumed that the defectivity of products is independent, sampling more of them isn’t really any more interesting than sampling one of them.)
But in order to do that calculation, we need some information about how much defective product each machine produces. As it turns out, we only need the first moment (i.e. mean) of that estimate; higher moments (like the variance) don’t show up in the calculation. (Is it clear to you how to verify that statement?) So a 5% chance that I’m absolutely certain of and a 5% chance that comes from a guess lead to the same final output.
For many probabilistic calculations, it’s helpful to do a sensitivity analysis. That is, we jiggle around the inputs we gave (like the percentage of the total output that each machine produces, or the defectivity rate of each machine, and so on) to determine how strongly they influence the outcome of the procedure. If we were just guessing with the 5% number, but we discover that dropping it to 4% makes a huge difference, then maybe we should go back and refine our estimate to be sure that it’s 5% instead of 4%. If the number is roughly the same, then our estimate is probably good enough.
If only the mean if the likelihood distribution is involved, not the variance, then truly the sample size used when creating the likelihood distribution has no influence on the Bayesian update.
Then the next question is: is it a problem? If I understand you correctly then your answer is: “not really, because ”.
Then it’s only the part I don’t get.
You ask me if it’s clear to me why only the mean if the likelihood distribution is involved in the Bayesian update. Well, it isn’t currently, but I’ll read the article “Continuous Bayes” and see if it then becomes more clear to me:
http://www.sidhantgodiwala.com/blog/2015/03/14/continuous-bayes/