You can make the analogy clearer if you imagine, instead of rummaging around in a hat, you lined up all the strips of paper in random order and read them one at a time. Then it makes sense that the total number of slips of paper shouldn’t matter.
This still doesn’t seem right to me. If a paper is the third paper, than the n-3 remaining papers will not have the same thing written on them as the 3d paper, and therefor it is less likely that I will observe whatever the 3d paper was than it was when I started. In the hat with replacement I have an even chance of seeing each one after I have observed it.
It stands to reason that if there were N papers, Y/N of them yeses, if I see and remove a y at the first trial, P(y_2|y_1) = Y-1/N-1 and this now becomes our prior and we use the same rule if we see another yes, if we ~yes, P(y_2|~y_1) = Y/N-1. Under this reasoning, it is clear that without replacement, as you remove yeses, you should expect nos more often because there are less yeses left.
The reason it seems that way is because you are imagining holding the number of Ys constant. However, if the number of Ys is unknown, you have to figure out what proportion of the cards say Y as you go along, so you get a different result.
Maybe an analogy will help. Because you draw the slips of paper in random order, they will not be correlated with each other except through the total percentages that say Y and N. Analogously, if you flip a weighted coin, the flips will not be correlated with each other except through the bias of the coin. Drawing a slip of paper follows the exact same mathematical rules as flipping a weighted coin. And so since Laplace’s rule of succession works for the weighted coin, it also works for the slips of paper.
Since you’re already thinking about keeping the number of Ys fixed, you may object, “but the number of Ys is fixed in the case of the papers and not fixed in the case of the coin, so they must be different.” So we can go a step further and imagine someone else flipping the coin, and then writing down what they get. Now when we read the papers, there is a fixed number of Ys, but since it’s the same coinflips all along, the probability of seeing Y or N is exactly the same. This demonstrates that having a finite amount of stuff doesn’t really matter, what matters is the mathematical rules that stuff follows.
Add-on:
You can make the analogy clearer if you imagine, instead of rummaging around in a hat, you lined up all the strips of paper in random order and read them one at a time. Then it makes sense that the total number of slips of paper shouldn’t matter.
This still doesn’t seem right to me. If a paper is the third paper, than the n-3 remaining papers will not have the same thing written on them as the 3d paper, and therefor it is less likely that I will observe whatever the 3d paper was than it was when I started. In the hat with replacement I have an even chance of seeing each one after I have observed it.
It stands to reason that if there were N papers, Y/N of them yeses, if I see and remove a y at the first trial, P(y_2|y_1) = Y-1/N-1 and this now becomes our prior and we use the same rule if we see another yes, if we ~yes, P(y_2|~y_1) = Y/N-1. Under this reasoning, it is clear that without replacement, as you remove yeses, you should expect nos more often because there are less yeses left.
The reason it seems that way is because you are imagining holding the number of Ys constant. However, if the number of Ys is unknown, you have to figure out what proportion of the cards say Y as you go along, so you get a different result.
Maybe an analogy will help. Because you draw the slips of paper in random order, they will not be correlated with each other except through the total percentages that say Y and N. Analogously, if you flip a weighted coin, the flips will not be correlated with each other except through the bias of the coin. Drawing a slip of paper follows the exact same mathematical rules as flipping a weighted coin. And so since Laplace’s rule of succession works for the weighted coin, it also works for the slips of paper.
Since you’re already thinking about keeping the number of Ys fixed, you may object, “but the number of Ys is fixed in the case of the papers and not fixed in the case of the coin, so they must be different.” So we can go a step further and imagine someone else flipping the coin, and then writing down what they get. Now when we read the papers, there is a fixed number of Ys, but since it’s the same coinflips all along, the probability of seeing Y or N is exactly the same. This demonstrates that having a finite amount of stuff doesn’t really matter, what matters is the mathematical rules that stuff follows.
Thanks :)