If you made no approximations, the normatively correct approach is to carry around your current probability estimate p, and a table which contains what p would be updated to under all possible pieces of evidence you could receive. For example, I might say “I know very little about sports, so I’ll assign probability 50% that the Dallas Cowboys will win their next game, but if my friend who follows football tells me they will, I’ll assign probability 75%, and if I see a bookie’s odds, I’ll adopt the implied probability estimate.” (This is, of course, an incomplete list—there are many, many other pieces of evidence I could see.) Obviously, these updates should follow the laws of probability on pain of paradox.
Why is this necessary to do things correctly? You can work out that I thought my friend’s prediction, because it moved me from 1:1 odds to 3:1 odds, has a likelihood ratio of three. But where did 3 come from? It’s the interaction between my knowledge and my friend’s knowledge. If the same friend makes the same prediction, then I shouldn’t update my probability, because the first time they give me useful info, and the second time they don’t give me any useful info. If a second friend also predicts that the Cowboys will win, then I need to estimate how correlated their predictions are in order to determine how to update.
The hyperparameter approach is the clean way to do this in cases where the likelihood of incoming evidence given existing evidence is easy to determine. If I’ve got a coin flipped in a random fashion (but weighted in an unknown way), then I think that successive flips are independent and equally indicative of the underlying propensity of the coin to land heads when flipped randomly. But if I’ve got a coin flipped in a precisely controlled deterministic fashion, then I don’t think that successive flips are independent and equally indicative of the underlying propensity of the coin to land heads, because that “propensity” is not longer a useful node in my model.
If you made no approximations, the normatively correct approach is to carry around your current probability estimate p, and a table which contains what p would be updated to under all possible pieces of evidence you could receive. For example, I might say “I know very little about sports, so I’ll assign probability 50% that the Dallas Cowboys will win their next game, but if my friend who follows football tells me they will, I’ll assign probability 75%, and if I see a bookie’s odds, I’ll adopt the implied probability estimate.” (This is, of course, an incomplete list—there are many, many other pieces of evidence I could see.) Obviously, these updates should follow the laws of probability on pain of paradox.
Why is this necessary to do things correctly? You can work out that I thought my friend’s prediction, because it moved me from 1:1 odds to 3:1 odds, has a likelihood ratio of three. But where did 3 come from? It’s the interaction between my knowledge and my friend’s knowledge. If the same friend makes the same prediction, then I shouldn’t update my probability, because the first time they give me useful info, and the second time they don’t give me any useful info. If a second friend also predicts that the Cowboys will win, then I need to estimate how correlated their predictions are in order to determine how to update.
The hyperparameter approach is the clean way to do this in cases where the likelihood of incoming evidence given existing evidence is easy to determine. If I’ve got a coin flipped in a random fashion (but weighted in an unknown way), then I think that successive flips are independent and equally indicative of the underlying propensity of the coin to land heads when flipped randomly. But if I’ve got a coin flipped in a precisely controlled deterministic fashion, then I don’t think that successive flips are independent and equally indicative of the underlying propensity of the coin to land heads, because that “propensity” is not longer a useful node in my model.