Virtual evidence requires probability functions to take arguments which aren’t part of the event space
Not necessarily. Typically, the events would be all the Lebesgue measurable subsets of the state space. That’s large enough to furnish a suitable event to play the role of the virtual evidence. In the example involving A, B, and the virtual event E, one would also have to somehow specify that the dependencies of A and B on E are in some sense independent of each other, but you already need that. That assumption is what gives sequence-independence.
The sequential dependence of the Jeffrey update results from violating that assumption. Updating P(B) to 60% already increases P(A), so updating from that new value of P(A) to 60% is a different update from the one you would have made by updating on P(A)=60% first.
virtual evidence treats Bayes’ Law (which is usually a derived theorem) as more fundamental than the ratio formula (which is usually taken as a definition).
That is the view taken by Jaynes, a dogmatic Bayesian if ever there was one. For Jaynes, all probabilities are conditional probabilities, and when one writes baldly P(A), this is really P(A|X), the probability of A given background information X. X may be unstated but is never absent: there is always background information. This also resolves Eliezer’s Pascal’s Muggle conundrum over how he should react to 10^10-to-1 evidence in favour of something for which he has a probability of 10^100-to-1 against. The background information X that went into the latter figure is called into question.
I notice that this suggests an approach to allowing one to update away from probabilities of 0 or 1, conventionally thought impossible.
Not necessarily. Typically, the events would be all the Lebesgue measurable subsets of the state space. That’s large enough to furnish a suitable event to play the role of the virtual evidence.
True, although I’m not sure this would be “virtual evidence” anymore by Pearl’s definition. But maybe this is the way it should be handled.
A different way of spelling out what’s unique about virtual evidence is that, while compatible with Bayes, the update itself is calculated via some other means (IE, we get the likelihoods from somewhere else than looking at pre-defined likelihoods for the proposition we’re updating on).
This also resolves Eliezer’s Pascal’s Muggle conundrum over how he should react to 10^10-to-1 evidence in favour of something for which he has a probability of 10^100-to-1 against. The background information X that went into the latter figure is called into question.
Does it, though? If you were going to call that background evidence into question for a mere 10^10-to-1 evidence, should the probability have been 10^100-to-1 against in the first place?
Does it, though? If you were going to call that background evidence into question for a mere 10^10-to-1 evidence, should the probability have been 10^100-to-1 against in the first place?
This is verging on the question, what do you do when the truth is not in the support of your model? That may be the main way you reach 10^100-to-1 odds in practice. Non-Bayesians like to pose this question as a knock-down of Bayesianism. I don’t agree with them, but I’m not qualified to argue the case.
Once you’ve accepted some X as evidence, i.e. conditioned all your probabilities on X, how do you recover from that when meeting new evidence Y that is extraordinarily unlikely (e.g. 10 to −100) given X? Pulling X out from behind the vertical bar may be a first step, but that still leaves you contemplating the extraordinarily unlikely proposition X&Y that you have nevertheless observed.
Not necessarily. Typically, the events would be all the Lebesgue measurable subsets of the state space. That’s large enough to furnish a suitable event to play the role of the virtual evidence. In the example involving A, B, and the virtual event E, one would also have to somehow specify that the dependencies of A and B on E are in some sense independent of each other, but you already need that. That assumption is what gives sequence-independence.
The sequential dependence of the Jeffrey update results from violating that assumption. Updating P(B) to 60% already increases P(A), so updating from that new value of P(A) to 60% is a different update from the one you would have made by updating on P(A)=60% first.
That is the view taken by Jaynes, a dogmatic Bayesian if ever there was one. For Jaynes, all probabilities are conditional probabilities, and when one writes baldly P(A), this is really P(A|X), the probability of A given background information X. X may be unstated but is never absent: there is always background information. This also resolves Eliezer’s Pascal’s Muggle conundrum over how he should react to 10^10-to-1 evidence in favour of something for which he has a probability of 10^100-to-1 against. The background information X that went into the latter figure is called into question.
I notice that this suggests an approach to allowing one to update away from probabilities of 0 or 1, conventionally thought impossible.
True, although I’m not sure this would be “virtual evidence” anymore by Pearl’s definition. But maybe this is the way it should be handled.
A different way of spelling out what’s unique about virtual evidence is that, while compatible with Bayes, the update itself is calculated via some other means (IE, we get the likelihoods from somewhere else than looking at pre-defined likelihoods for the proposition we’re updating on).
Does it, though? If you were going to call that background evidence into question for a mere 10^10-to-1 evidence, should the probability have been 10^100-to-1 against in the first place?
This is verging on the question, what do you do when the truth is not in the support of your model? That may be the main way you reach 10^100-to-1 odds in practice. Non-Bayesians like to pose this question as a knock-down of Bayesianism. I don’t agree with them, but I’m not qualified to argue the case.
Once you’ve accepted some X as evidence, i.e. conditioned all your probabilities on X, how do you recover from that when meeting new evidence Y that is extraordinarily unlikely (e.g. 10 to −100) given X? Pulling X out from behind the vertical bar may be a first step, but that still leaves you contemplating the extraordinarily unlikely proposition X&Y that you have nevertheless observed.