Understandable questions. I hope to expand this talk into a post which will explain things more properly.
Think of the two requirements for Bayes updates as forming a 2x2 matrix. If you have both (1) all information you learned can be summarised into one proposition which you learn with 100% confidence, and (2) you know ahead of time how you would respond to that information, then you must perform a Bayesian update. If you have (2) but not (1), ie you update some X to less than 100% confidence but you knew ahead of time how you would update to changed beliefs about X, then you are required to do a Jeffrey update. But if you don’t have (2), updates are not very constrained by Dutch-book type rationality. So in general, Jeffrey argued that there are many valid updates beyond Bayes and Jeffrey updates.
Jeffrey updates are a simple generalization of Bayes updates. When a Bayesian learns X, they update it to 100%, and take P(Y|X) to be the new P(Y) for all Y. (More formally, we want to update P to get a new probability measure Q. We do so by setting Q(Y)=P(Y|X) for all Y.) Jeffrey wanted to handle the case where you somehow become 90% confident of X, instead of fully confident. He thought this was more true to human experience. A Jeffrey update is just the weighted average of the two possible Bayesian updates. (More formally, we want to update P to get Q where Q(X)=c for some chosen c. We set Q(Y) = cP(Y|X) + (1-c)P(Y|~X).)
A natural response for a classical Bayesian is: where does 90% come from? (Where does c come from?) But the Radical Probabilism retort is: where do observations come from? The Bayesian already works in a framework where information comes in from “outside” somehow. The radical probabilist is just working in a more general framework where more general types of evidence can come in from outside.
Pearl argued against this practice in his book introducing Bayesian networks. But he introduced an equivalent—but more practical—concept which he calls virtual evidence. The Bayesian intuition freaks out at somehow updating X to 90% without any explanation. But the virtual evidence version is much more intuitive. (Look it up; I think you’ll like it better.) I don’t think virtual evidence goes against the spirit of Radical Probabilism at all, and in fact if you look at Jeffrey’s writing he appears to embrace it. So I hope to give that version in my forthcoming post, and explain why it’s nicer than Jeffrey updates in practice.
Jeffrey wanted to handle the case where you somehow become 90% confident of X, instead of fully confident
How does this differ from a Bayesian update? You can update on a new probability distribution over X just as you can on a point value. In fact, if you’re updating the probabilities in a Bayesian network, like you described, then even if the evidence you are updating on is a point value for some initial variable in the graph, the propagation steps will in general be updates on the new probability distributions for parent variables.
Thanks! That answers a lot of my questions even without a concrete example.
I found this part of your reply particularly interesting:
if you don’t have (2), updates are not very constrained by Dutch-book type rationality. So in general, Jeffrey argued that there are many valid updates beyond Bayes and Jeffrey updates.
The abstract example I came up with after reading that was something like ‘I think A at 60%. If I observe X, then I’d update to A at 70%. If I observe Y, then I’d update to A at 40%. If I observe Z, I don’t know what I’d think.’.
I think what’s a little confusing is that I imagined these kinds of adjustments were already incorporated into ‘Bayesian reasoning’. Like, for the canonical ‘cancer test result’ example, we could easily adjust our understanding of ‘receives a positive test result’ to include uncertainty about the evidence itself, e.g. maybe the test was performed incorrectly or the result was misreported by the lab.
Do the ‘same’ priors cover our ‘base’ credence of different types of evidence? How are probabilities reasonably, or practically, assigned or calculated for different types of evidence? (Do we need to further adjust our confidence of those assignment or calculations?)
Maybe I do still need a concrete example to reach a decent understanding.
Richard Bradley gives an example of a non-Bayes non-Jeffrey update in Radical Probabilism and Bayesian Conditioning. He calls his third type of update Adams conditioning. But he goes even further, giving an example which is not Bayes, Jeffrey, or Adams (the example with the pipes toward the end; figure 1 and accompanying text). To be honest I still find the example a bit baffling, because I’m not clear on why we’re allowed to predictably violate the rigidity constraint in the case he considers.
I think what’s a little confusing is that I imagined these kinds of adjustments were already incorporated into ‘Bayesian reasoning’. Like, for the canonical ‘cancer test result’ example, we could easily adjust our understanding of ‘receives a positive test result’ to include uncertainty about the evidence itself, e.g. maybe the test was performed incorrectly or the result was misreported by the lab.
We can always invent a classically-bayesian scenario where we’re uncertain about some particular X, by making it so we can’t directly observe X, but rather get some other observations. EG, if we can’t directly observe the test results but we’re told about it through a fallible line of communication. What’s radical about Jeffrey’s view is to allow the observations themselves to be uncertain. So if you look at e.g. a color but aren’t sure what you’re looking at, you don’t have to contrive a color-like proposition which you do observe in order to record your imperfect observation of color.
You can think of radical probabilism as “Bayesianism at a distance”: like if you were watching a Bayesian agent, but couldn’t bother to record every single little sense-datum. You want to record that the test results are probably positive, without recording your actual observations that make you think that. We can always posit underlying observations which make the radical-probabilist agent classically Bayesian. Think of Jeffrey as pointing out that it’s often easier to work “at a distance” instead, and than once you start thinking this way, you can see it’s closer to your conscious experience anyway—so why posit underlying propositions which make all your updates into Bayes updates?
As for me, I have no problem with supposing the existence of such underlying propositions (I’ll be making a post elaborating on that at some point...) but find radical probabilism to nonetheless be a very philosophically significant point.
Understandable questions. I hope to expand this talk into a post which will explain things more properly.
Think of the two requirements for Bayes updates as forming a 2x2 matrix. If you have both (1) all information you learned can be summarised into one proposition which you learn with 100% confidence, and (2) you know ahead of time how you would respond to that information, then you must perform a Bayesian update. If you have (2) but not (1), ie you update some X to less than 100% confidence but you knew ahead of time how you would update to changed beliefs about X, then you are required to do a Jeffrey update. But if you don’t have (2), updates are not very constrained by Dutch-book type rationality. So in general, Jeffrey argued that there are many valid updates beyond Bayes and Jeffrey updates.
Jeffrey updates are a simple generalization of Bayes updates. When a Bayesian learns X, they update it to 100%, and take P(Y|X) to be the new P(Y) for all Y. (More formally, we want to update P to get a new probability measure Q. We do so by setting Q(Y)=P(Y|X) for all Y.) Jeffrey wanted to handle the case where you somehow become 90% confident of X, instead of fully confident. He thought this was more true to human experience. A Jeffrey update is just the weighted average of the two possible Bayesian updates. (More formally, we want to update P to get Q where Q(X)=c for some chosen c. We set Q(Y) = cP(Y|X) + (1-c)P(Y|~X).)
A natural response for a classical Bayesian is: where does 90% come from? (Where does c come from?) But the Radical Probabilism retort is: where do observations come from? The Bayesian already works in a framework where information comes in from “outside” somehow. The radical probabilist is just working in a more general framework where more general types of evidence can come in from outside.
Pearl argued against this practice in his book introducing Bayesian networks. But he introduced an equivalent—but more practical—concept which he calls virtual evidence. The Bayesian intuition freaks out at somehow updating X to 90% without any explanation. But the virtual evidence version is much more intuitive. (Look it up; I think you’ll like it better.) I don’t think virtual evidence goes against the spirit of Radical Probabilism at all, and in fact if you look at Jeffrey’s writing he appears to embrace it. So I hope to give that version in my forthcoming post, and explain why it’s nicer than Jeffrey updates in practice.
Huh, I’m really surprised this isn’t Q(Y) = cP(Y|X) + (1-c)P(Y|~X). Was that a typo? If not, why choose your equation over mine?
Ah, yep! Corrected.
How does this differ from a Bayesian update? You can update on a new probability distribution over X just as you can on a point value. In fact, if you’re updating the probabilities in a Bayesian network, like you described, then even if the evidence you are updating on is a point value for some initial variable in the graph, the propagation steps will in general be updates on the new probability distributions for parent variables.
Thanks! That answers a lot of my questions even without a concrete example.
I found this part of your reply particularly interesting:
The abstract example I came up with after reading that was something like ‘I think A at 60%. If I observe X, then I’d update to A at 70%. If I observe Y, then I’d update to A at 40%. If I observe Z, I don’t know what I’d think.’.
I think what’s a little confusing is that I imagined these kinds of adjustments were already incorporated into ‘Bayesian reasoning’. Like, for the canonical ‘cancer test result’ example, we could easily adjust our understanding of ‘receives a positive test result’ to include uncertainty about the evidence itself, e.g. maybe the test was performed incorrectly or the result was misreported by the lab.
Do the ‘same’ priors cover our ‘base’ credence of different types of evidence? How are probabilities reasonably, or practically, assigned or calculated for different types of evidence? (Do we need to further adjust our confidence of those assignment or calculations?)
Maybe I do still need a concrete example to reach a decent understanding.
Richard Bradley gives an example of a non-Bayes non-Jeffrey update in Radical Probabilism and Bayesian Conditioning. He calls his third type of update Adams conditioning. But he goes even further, giving an example which is not Bayes, Jeffrey, or Adams (the example with the pipes toward the end; figure 1 and accompanying text). To be honest I still find the example a bit baffling, because I’m not clear on why we’re allowed to predictably violate the rigidity constraint in the case he considers.
We can always invent a classically-bayesian scenario where we’re uncertain about some particular X, by making it so we can’t directly observe X, but rather get some other observations. EG, if we can’t directly observe the test results but we’re told about it through a fallible line of communication. What’s radical about Jeffrey’s view is to allow the observations themselves to be uncertain. So if you look at e.g. a color but aren’t sure what you’re looking at, you don’t have to contrive a color-like proposition which you do observe in order to record your imperfect observation of color.
You can think of radical probabilism as “Bayesianism at a distance”: like if you were watching a Bayesian agent, but couldn’t bother to record every single little sense-datum. You want to record that the test results are probably positive, without recording your actual observations that make you think that. We can always posit underlying observations which make the radical-probabilist agent classically Bayesian. Think of Jeffrey as pointing out that it’s often easier to work “at a distance” instead, and than once you start thinking this way, you can see it’s closer to your conscious experience anyway—so why posit underlying propositions which make all your updates into Bayes updates?
As for me, I have no problem with supposing the existence of such underlying propositions (I’ll be making a post elaborating on that at some point...) but find radical probabilism to nonetheless be a very philosophically significant point.
Thanks again!
Your point about “Bayesianism at a distance” makes a lot of sense.