As I understand it, frequentism requires large numbers of events for its interpretation of probability, whereas the bayesian interpretation allows the convergence of relative frequencies with probabilities but claims that probability is a meaningful concept when applied to unique events, as a “degree of plausibility”.
Do you (or anyone else reading this) know of any attempts to give a precise non-frequentist interpretation of the exact numerical values of Bayesian probabilities? What I mean is someone trying to give a precise meaning to the claim that the “degree of plausibility” of a hypothesis (or prediction or whatever) is, say, 0.98, which wouldn’t boil down to the frequentist observation that relative to some reference class, it would be right 98⁄100 of the time, as in the above quoted example.
Or to put it in a way that might perhaps be clearer, suppose we’re dealing with the claim that the “degree of plausibility” of a hypothesis is 0.2. Not 0.19, or 0.21, or even 0.1999 or 0.2001, but exactly that specific value. Now, I have no intuition whatsoever for what it might mean that the “degree of plausibility” I assign to some proposition is equal to one of these numbers and not any of the other mentioned ones—except if I can conceive of an experiment or observation (or at least a thought-experiment) that would yield that particular exact number via a frequentist ratio.
I’m not trying to open the whole Bayesian vs. frequentist can of worms at this moment; I’d just like to find out if I’ve missed any significant references that discuss this particular question.
Yes, I remember reading that post a while ago when I was still just lurking here. But I forgot about it in the meantime, so thanks for bringing it to my attention again. It’s something I’ll definitely need to think about more.
In the Bayesian interpretation, the numerical value of a probability is derived via considerations such as the principle of indifference—if I know nothing more about propositon A than I know about proposition B, then I hold both equally probable. (So, if all I know about a coin is that it is a biased coin, without knowing how it is biased, I still hold Heads or Tails equally probable outcomes of the next coin flip.)
If we do know something more about A or B, then we can apply formulae such as the sum rule or product rule, or Bayes’ rule which is derived from them, to obtain a “posterior probability” based on our initial estimation (or “prior probability”). (In the coin example, I would be able to take into account any number of coin flips as evidence, but I would first need to specify through such a prior probability what I take “a biased coin” to mean in terms of probability; whereas a frequentist approach relies only on flipping the coin enough times to reach a given degree of confidence.)
(Note, this is my understanding based on having partially read through precisely one text—Jaynes’ Probability Theory—on top of some Web browsing; not an expert’s opinion.)
Do you (or anyone else reading this) know of any attempts to give a precise non-frequentist interpretation of the exact numerical values of Bayesian probabilities?
Yes, you can do this precisely with measure theory, but some will argue that that is nice math but not a philosophically satisfying approach.
Edit: A more concrete approach is to just think about it as what bets you should make about possible outcomes.
Yes, you can do this precisely with measure theory, but some will argue that that is nice math but not a philosophically satisfying approach.
I’m not sure I understand what exactly you have in mind. I am aware of the role of measure theory in the standard modern formalization of probability theory, and how it provides for a neat treatment of continuous probability distributions. However, what I’m interested in is not the math, but the meaning of the numbers in the real world.
Bayesians often make claims like, say, “I assign the probability of 0.2 to the hypothesis/prediction X.” This is a factual claim, which asserts that some quantity is equal to 0.2, not any other number. This means that those making such claims should be able to point at some observable property of the real world related to X that gives rise to this particular number, not some other one. What I’d like to find out is whether there are attempts at non-frequentist responses to this sort of request.
Edit: A more concrete approach is to just think about it as what bets you should make about possible outcomes.
But it seems to me that betting advice is fundamentally frequentist in nature. As far as I can see, the only practical test of whether a betting strategy is good or bad is the expected gain (or loss) it will provide over a large number of bets. [Edit: I should have been more clear here—I assume that you are not using an incoherent strategy vulnerable to a Dutch book. I had in mind strategies where you respect the axioms of probability, and the only question is which numbers consistent with them you choose.]
Bayesians, would say that the probability is (some function of) the expected value of one bet.
Frequentists, would say that it is (some function of) the actual value of many bets (as the amount of bets goes to infinity).
The whole point of looking at many bets is to make the average value close to the expected value (so that frequentists don’t have to think about what “expected” actually means). You never have to say “the expected gain … over a large number of bets.” That would be redundant.
What does “expected” actually mean? It’s just the probabilty you should bet at to avoid the possibility of being Dutch-booked on any single bet.
ETA: When you are being Dutch-booked, you don’t get to look at all the offered bets at once and say “hold on a minute, you’re trying to trick me”. You get given each of the bets one at a time, and you have to bet Bayesianly for each one if you want to avoid any possibility of sure losses.
I might be mistaken, but I think this still doesn’t answer my question. I understand—or at least I think I do—how the Dutch book argument can be used to establish the axioms of probability and the entire mathematical theory that follows from them (including the Bayes theorem).
The way I understand it, this argument says that once I’ve assigned some probability to an event, I must assign all the other probabilities in a way consistent with the probability axioms. For example, if I assign P(A) = 0.3 and P(B) = 0.4, I would be opening myself to a Dutch book if I assigned, say, P(~A) != 0.7 or P(A and B) > 0.3. So far, so good.
However, I still don’t see what, if anything, the Dutch book argument tells us about the ultimate meaning of the probability numbers. If I claim that the probability of Elbonia declaring war on Ruritania before next Christmas is 0.3, then to avoid being Dutch-booked, I must maintain that the probability of that event not happening is 0.7, and all the other stuff necessitated by the probability axioms. However, if someone comes to me and claims that the probability is not 0.3, but 0.4 instead, in what way could he argue, under any imaginable circumstances and either before or after the fact, that his figure is correct and mine not? What fact observable in physical reality could he point out and say that it’s consistent with one number, but not the other?
I understand that if we both stick to our different probabilities and make bets based on them, we can get Dutch-booked collectively (someone sells him a bet that pays off $100 if the war breaks out for $39, and to me a bet that pays off $100 in the reverse case for $69 -- and wins $8 whatever happens). But this merely tells us that something irrational is going on if we insist (and act) on different probability estimates. It doesn’t tell us, as far as I can see, how one number could be correct, and all others incorrect—unless we start talking about a large reference class of events and frequencies at some point.
There’s nothing mysterious about it as far as I can tell, it’s “just math”.
Give me a six-sided die and I’ll compute the probability of it coming up 4 as 1⁄6. This simple exercise can become more complicated in one of two ways. You can ask me to compute the probability of a more complex event, e.g. “three even numbers in a row”. This still has an exact answer.
The other complication is if the die is loaded. One way I might find out how that affects its single-throw probabilities is by throwing it a large number of times, but conceivably I can also X-ray the die, find out how its mass is distributed, and deduce from that how the single-throw probabilities differ. (Offhand I’d say that faces closer to the center of mass of the die are more likely to come up, but perhaps the calculation is more interesting than that.)
In the case of Elbonia vs Ruritania, the other guy has some information that you don’t, perhaps for instance the transcript of an intercepted conversation between the Elbonian ruler and a nearby power assuring the former of their support against any unwarranted Ruritanian aggression: they think the war is more plausible given this information.
Further, if you agreed with that person in all other respects, i.e. if his derivation of the probability for war given all other relevant information was also 0.3 absent the interception, and you agreed on how verbal information translated into numbers, then you would have no choice but to also accept the final figure of 0.4 conditioning on the interception. Bayesian probability is presented as an exact system of inference (and Jaynes is pretty convincing on this point, I should add).
I agree about Jaynes and the exactness of Bayesian inference. (I haven’t read his Probability Theory fully, but I should definitely get to it sometime. I did got through the opening chapters however, and it’s indeed mighty convincing.) Yet, I honestly don’t see how either Jaynes or your comments answer my question in full, though I seen no significant disagreement with what you’ve written. Let me try rephrasing my question once more.
In natural sciences, when you characterize some quantity with a number, this number must make sense in some empirical way, testable in an experiment, or at least with a thought experiment if a real one isn’t feasible in practice. Suppose that you’ve determined somehow that the temperature of a bowl of water is 300K, and someone asks you what exactly this number means in practice—why 300, and not 290, or 310, or 299, or 301? You can reply by describing (or even performing) various procedures with that bowl of water that will give predictably different outcomes depending on its exact temperature—and the results of some such procedures with this particular bowl are consistent only with a temperature of 300K plus/minus some small value that can be made extremely tiny with a careful setup, and not any other numbers.
Note that the question posed here is not how we’ve determined what the temperature of the water is in the first place. Instead, the question is: once we’ve made the claim that the temperature is some particular number, what practical observation can we make that will show that this particular number is consistent with reality, and others aren’t? If an number can’t be justified that way, then it is not something science can work with, and there is no reason to consider one value as “correct” and another “incorrect.”
So now, when I ask the same question about probability, I’m not asking about the procedures we use to derive these numbers. I’m asking: once we’ve made the claim that the probability of some event is p, what practical observations can we make that will show that this particular number is consistent with reality, and others aren’t—except by pointing to frequencies of events? I understand how we would reach a probability figure in the Elbonia vs. Ruritania scenario, I agree that Bayesian inference is an exact system, and I see what the possible sources of disagreement could be and how they should be straightened out when asymmetrical information is eliminated. I am not arguing with any of that (at least in the present context). Rather, what I’d like to know is whether the figures ultimately reached make any practical sense in terms of some observable properties of the universe, except for the frequentist ratios predicted by them. (And if the latter is the only answer, this presumably means that any sensible interpretation of probability would have to incorporate significant frequentist elements.)
That question, interesting as it is, is above my pay grade; I’m happy enough when I get the equations to line up the right way. I’ll let others tackle it if so inclined.
As I understand it, frequentism requires large numbers of events for its interpretation of probability, whereas the bayesian interpretation allows the convergence of relative frequencies with probabilities but claims that probability is a meaningful concept when applied to unique events, as a “degree of plausibility”.
Do you (or anyone else reading this) know of any attempts to give a precise non-frequentist interpretation of the exact numerical values of Bayesian probabilities? What I mean is someone trying to give a precise meaning to the claim that the “degree of plausibility” of a hypothesis (or prediction or whatever) is, say, 0.98, which wouldn’t boil down to the frequentist observation that relative to some reference class, it would be right 98⁄100 of the time, as in the above quoted example.
Or to put it in a way that might perhaps be clearer, suppose we’re dealing with the claim that the “degree of plausibility” of a hypothesis is 0.2. Not 0.19, or 0.21, or even 0.1999 or 0.2001, but exactly that specific value. Now, I have no intuition whatsoever for what it might mean that the “degree of plausibility” I assign to some proposition is equal to one of these numbers and not any of the other mentioned ones—except if I can conceive of an experiment or observation (or at least a thought-experiment) that would yield that particular exact number via a frequentist ratio.
I’m not trying to open the whole Bayesian vs. frequentist can of worms at this moment; I’d just like to find out if I’ve missed any significant references that discuss this particular question.
Have you seen my What Are Probabilities, Anyway? post?
Yes, I remember reading that post a while ago when I was still just lurking here. But I forgot about it in the meantime, so thanks for bringing it to my attention again. It’s something I’ll definitely need to think about more.
In the Bayesian interpretation, the numerical value of a probability is derived via considerations such as the principle of indifference—if I know nothing more about propositon A than I know about proposition B, then I hold both equally probable. (So, if all I know about a coin is that it is a biased coin, without knowing how it is biased, I still hold Heads or Tails equally probable outcomes of the next coin flip.)
If we do know something more about A or B, then we can apply formulae such as the sum rule or product rule, or Bayes’ rule which is derived from them, to obtain a “posterior probability” based on our initial estimation (or “prior probability”). (In the coin example, I would be able to take into account any number of coin flips as evidence, but I would first need to specify through such a prior probability what I take “a biased coin” to mean in terms of probability; whereas a frequentist approach relies only on flipping the coin enough times to reach a given degree of confidence.)
(Note, this is my understanding based on having partially read through precisely one text—Jaynes’ Probability Theory—on top of some Web browsing; not an expert’s opinion.)
Yes, you can do this precisely with measure theory, but some will argue that that is nice math but not a philosophically satisfying approach.
Edit: A more concrete approach is to just think about it as what bets you should make about possible outcomes.
I’m not sure I understand what exactly you have in mind. I am aware of the role of measure theory in the standard modern formalization of probability theory, and how it provides for a neat treatment of continuous probability distributions. However, what I’m interested in is not the math, but the meaning of the numbers in the real world.
Bayesians often make claims like, say, “I assign the probability of 0.2 to the hypothesis/prediction X.” This is a factual claim, which asserts that some quantity is equal to 0.2, not any other number. This means that those making such claims should be able to point at some observable property of the real world related to X that gives rise to this particular number, not some other one. What I’d like to find out is whether there are attempts at non-frequentist responses to this sort of request.
But it seems to me that betting advice is fundamentally frequentist in nature. As far as I can see, the only practical test of whether a betting strategy is good or bad is the expected gain (or loss) it will provide over a large number of bets. [Edit: I should have been more clear here—I assume that you are not using an incoherent strategy vulnerable to a Dutch book. I had in mind strategies where you respect the axioms of probability, and the only question is which numbers consistent with them you choose.]
Bayesians, would say that the probability is (some function of) the expected value of one bet.
Frequentists, would say that it is (some function of) the actual value of many bets (as the amount of bets goes to infinity).
The whole point of looking at many bets is to make the average value close to the expected value (so that frequentists don’t have to think about what “expected” actually means). You never have to say “the expected gain … over a large number of bets.” That would be redundant.
What does “expected” actually mean? It’s just the probabilty you should bet at to avoid the possibility of being Dutch-booked on any single bet.
ETA: When you are being Dutch-booked, you don’t get to look at all the offered bets at once and say “hold on a minute, you’re trying to trick me”. You get given each of the bets one at a time, and you have to bet Bayesianly for each one if you want to avoid any possibility of sure losses.
I might be mistaken, but I think this still doesn’t answer my question. I understand—or at least I think I do—how the Dutch book argument can be used to establish the axioms of probability and the entire mathematical theory that follows from them (including the Bayes theorem).
The way I understand it, this argument says that once I’ve assigned some probability to an event, I must assign all the other probabilities in a way consistent with the probability axioms. For example, if I assign P(A) = 0.3 and P(B) = 0.4, I would be opening myself to a Dutch book if I assigned, say, P(~A) != 0.7 or P(A and B) > 0.3. So far, so good.
However, I still don’t see what, if anything, the Dutch book argument tells us about the ultimate meaning of the probability numbers. If I claim that the probability of Elbonia declaring war on Ruritania before next Christmas is 0.3, then to avoid being Dutch-booked, I must maintain that the probability of that event not happening is 0.7, and all the other stuff necessitated by the probability axioms. However, if someone comes to me and claims that the probability is not 0.3, but 0.4 instead, in what way could he argue, under any imaginable circumstances and either before or after the fact, that his figure is correct and mine not? What fact observable in physical reality could he point out and say that it’s consistent with one number, but not the other?
I understand that if we both stick to our different probabilities and make bets based on them, we can get Dutch-booked collectively (someone sells him a bet that pays off $100 if the war breaks out for $39, and to me a bet that pays off $100 in the reverse case for $69 -- and wins $8 whatever happens). But this merely tells us that something irrational is going on if we insist (and act) on different probability estimates. It doesn’t tell us, as far as I can see, how one number could be correct, and all others incorrect—unless we start talking about a large reference class of events and frequencies at some point.
There’s nothing mysterious about it as far as I can tell, it’s “just math”.
Give me a six-sided die and I’ll compute the probability of it coming up 4 as 1⁄6. This simple exercise can become more complicated in one of two ways. You can ask me to compute the probability of a more complex event, e.g. “three even numbers in a row”. This still has an exact answer.
The other complication is if the die is loaded. One way I might find out how that affects its single-throw probabilities is by throwing it a large number of times, but conceivably I can also X-ray the die, find out how its mass is distributed, and deduce from that how the single-throw probabilities differ. (Offhand I’d say that faces closer to the center of mass of the die are more likely to come up, but perhaps the calculation is more interesting than that.)
In the case of Elbonia vs Ruritania, the other guy has some information that you don’t, perhaps for instance the transcript of an intercepted conversation between the Elbonian ruler and a nearby power assuring the former of their support against any unwarranted Ruritanian aggression: they think the war is more plausible given this information.
Further, if you agreed with that person in all other respects, i.e. if his derivation of the probability for war given all other relevant information was also 0.3 absent the interception, and you agreed on how verbal information translated into numbers, then you would have no choice but to also accept the final figure of 0.4 conditioning on the interception. Bayesian probability is presented as an exact system of inference (and Jaynes is pretty convincing on this point, I should add).
I agree about Jaynes and the exactness of Bayesian inference. (I haven’t read his Probability Theory fully, but I should definitely get to it sometime. I did got through the opening chapters however, and it’s indeed mighty convincing.) Yet, I honestly don’t see how either Jaynes or your comments answer my question in full, though I seen no significant disagreement with what you’ve written. Let me try rephrasing my question once more.
In natural sciences, when you characterize some quantity with a number, this number must make sense in some empirical way, testable in an experiment, or at least with a thought experiment if a real one isn’t feasible in practice. Suppose that you’ve determined somehow that the temperature of a bowl of water is 300K, and someone asks you what exactly this number means in practice—why 300, and not 290, or 310, or 299, or 301? You can reply by describing (or even performing) various procedures with that bowl of water that will give predictably different outcomes depending on its exact temperature—and the results of some such procedures with this particular bowl are consistent only with a temperature of 300K plus/minus some small value that can be made extremely tiny with a careful setup, and not any other numbers.
Note that the question posed here is not how we’ve determined what the temperature of the water is in the first place. Instead, the question is: once we’ve made the claim that the temperature is some particular number, what practical observation can we make that will show that this particular number is consistent with reality, and others aren’t? If an number can’t be justified that way, then it is not something science can work with, and there is no reason to consider one value as “correct” and another “incorrect.”
So now, when I ask the same question about probability, I’m not asking about the procedures we use to derive these numbers. I’m asking: once we’ve made the claim that the probability of some event is p, what practical observations can we make that will show that this particular number is consistent with reality, and others aren’t—except by pointing to frequencies of events? I understand how we would reach a probability figure in the Elbonia vs. Ruritania scenario, I agree that Bayesian inference is an exact system, and I see what the possible sources of disagreement could be and how they should be straightened out when asymmetrical information is eliminated. I am not arguing with any of that (at least in the present context). Rather, what I’d like to know is whether the figures ultimately reached make any practical sense in terms of some observable properties of the universe, except for the frequentist ratios predicted by them. (And if the latter is the only answer, this presumably means that any sensible interpretation of probability would have to incorporate significant frequentist elements.)
That question, interesting as it is, is above my pay grade; I’m happy enough when I get the equations to line up the right way. I’ll let others tackle it if so inclined.