This still doesn’t seem right to me. If a paper is the third paper, than the n-3 remaining papers will not have the same thing written on them as the 3d paper, and therefor it is less likely that I will observe whatever the 3d paper was than it was when I started. In the hat with replacement I have an even chance of seeing each one after I have observed it.
It stands to reason that if there were N papers, Y/N of them yeses, if I see and remove a y at the first trial, P(y_2|y_1) = Y-1/N-1 and this now becomes our prior and we use the same rule if we see another yes, if we ~yes, P(y_2|~y_1) = Y/N-1. Under this reasoning, it is clear that without replacement, as you remove yeses, you should expect nos more often because there are less yeses left.
Thanks :)