You got it right. The three different cases correspond to different joint distributions over sequences of outcomes. Prior information that one of the cases obtains amounts to picking one of these distributions (of course, one can also have weighted combinations of these distributions if there is uncertainty about which case obtains). It turns out that in this example, if you add together the probabilities of all the sequences that have a red ball in the second position, you will get 0.5 for each of the three distributions. So equal prior probabilities. But even though the terms sum to 0.5 in all three cases, the individual terms will not be the same. For instance, prior information of case 1 would assign a different probability to RRRRR (0.004) than prior information of case 2 (0.031).
So the prior information is a joint distribution over sequences of outcomes, while the prior probability of the hypothesis is (in this example at least) a marginal distribution calculated from this joint distribution. Since multiple joint distributions can give you the same marginal distribution for some random variable, different prior information can correspond to the same prior probability.
When you restrict attention to those sequences that have a red ball in the first position, and now add together the (appropriately renormalized) joint probabilities of sequences with a red ball in the second position, you don’t get the same number with all three distributions. This corresponds to the fact that the three distributions are associated with different learning rules.
You got it right. The three different cases correspond to different joint distributions over sequences of outcomes. Prior information that one of the cases obtains amounts to picking one of these distributions (of course, one can also have weighted combinations of these distributions if there is uncertainty about which case obtains). It turns out that in this example, if you add together the probabilities of all the sequences that have a red ball in the second position, you will get 0.5 for each of the three distributions. So equal prior probabilities. But even though the terms sum to 0.5 in all three cases, the individual terms will not be the same. For instance, prior information of case 1 would assign a different probability to RRRRR (0.004) than prior information of case 2 (0.031).
So the prior information is a joint distribution over sequences of outcomes, while the prior probability of the hypothesis is (in this example at least) a marginal distribution calculated from this joint distribution. Since multiple joint distributions can give you the same marginal distribution for some random variable, different prior information can correspond to the same prior probability.
When you restrict attention to those sequences that have a red ball in the first position, and now add together the (appropriately renormalized) joint probabilities of sequences with a red ball in the second position, you don’t get the same number with all three distributions. This corresponds to the fact that the three distributions are associated with different learning rules.