Perhaps it would be wiser to use complex numbers for instance.
Perhaps it might be wiser to use measures (distributions), or measures on spaces of measures, or iterate that construction indefinitely. (The concept of hyperpriors seems to go in this direction, for example.)
But intuitively it seems very likely that if you tell me two different propositions, that I can say either that one is more likely than the other, or that they are the same. Are there any special cases where one has to answer “the probabilities are uncomparable” that makes you doubt that it is so?
Consider the following propositions.
P1: The recently minted U.S. quarter I just vigorously flipped into the air landed heads on the floor.
P2: A ball pulled from an unspecified urn containing an unspecified number of balls is white.
P3(x): The probability of P2 is x
Part of the problem is the laxness in specifying the language, as I mentioned. For example, if the language we use is rich enough to support self-referring interpretations, then it may not even be possible to coherently assign a truth value—or any other probability, or to know whether that is possible.
But even ruling out Goedelian potholes in the landscape and uncountably infinite families of propositions, the contrast between P1 and P2 is problematic. P1 is backed up by a vast trove of background knowledge and evidence, and our confidence in asserting Prob(P1) = 1⁄2 is very strong. On the other hand, background knowledge and evidence about P2 is virtually nil. It is reasonable as a matter of customary usage to assume the number of balls in the urn is finite, and thus the probability of P1 is a rational number, but until you start adding in more assumptions and evidence, one’s confidence in Prob(P2) < x for any particular real number x seems typically to be very much lower than for P1. Summarizing one’s state of knowledge about these two propositions onto the same scale of reals between 0 and 1 seems to ignore an awful lot that we know about the relative state of knowledge vs. ignorance with respect to P1 and P2. An awful lot of knowledge is being jettisoned because it won’t fit into this scheme of definite real numbers. To make the claim Prob(P2) = 1⁄2 (or any other definite real number you want to name) just does not seem like the same kind of thing as the claim Prob(P1) = 1⁄2. It feels like a category mistake.
Jaynes addresses this to some degree in Appendix A4 “Comparative Probability”. He presents an argument that seems to go like this. It hardly matters very much what real number we use to start with for a statement without much background evidence, because the more evidence we accumulate, the more our assignments are coordinated with other statements into a comprehensive picture, and the probabilities eventually converge to true and correct values. That’s a heartening way to look at it, but it also goes to show that many of the assignments of specific real numbers we make, such as for P2 or P3, are largely irrelevancies that are right next door to meaningless. And in the end he reiterates his initial argument that the benefits of being able to have a real number to calculate with are irresistible. This comes at the price of helping ourselves to the illusion of more precision than our state of ignorance seems to entitle us to. This is why the axiom of comparability seems to me to make an unnatural correspondence to the way we could or should think about these things.
Summarizing one’s state of knowledge about these two propositions onto the same scale of reals between 0 and 1 seems to ignore an awful lot
We’re getting ahead of the reading, but there’s a key distinction between the plausibility of a single proposition (i.e. a probability) and the plausibilities of a whole family of related plausibilities (i.e. a probability distribution).
Our state of knowledge about the coin is such that if we assessed probabilities for the class of propositions, “This coin has a bias X”, where X ranged from 0 (always heads) to 1 (always tails) we would find our prior distribution a sharp spike centered on 1⁄2. That, technically, is what we mean by “confidence”, and formally we will be using things like the variance of the distribution.
We’re getting ahead of the reading, but there’s a key distinction between the plausibility of a single proposition (i.e. a probability) and the plausibilities of a whole family of related plausibilities (i.e. a probability distribution).
Ok, that sounds helpful. But then my question is this—if we have whole family of mutually exclusive propositions, with varying real numbers for plausibilities, about the plausibility of one particular proposition, then the assumption that that one proposition can have one specific real number as its plausibility is cast in doubt. I don’t yet see how we can have all those plausibility assignments in a coherent whole. But I’m happy to leave my question on the table if we’ll come to that part later.
If you have a mutually exclusive and exhaustive set of propositions Ai, each of which specifies a plausibility
) for the one proposition B you’re interested in, then your total plausilibity is =\sum_iP(B|A_i)P(A_i)). (Actually this is true whether or not the A’s say anything about B. But if they do, then this can be useful way to think about P(B).)
I haven’t said how to assign plausibilities to the A’s (quick, what’s the plausibility that an unspecified urn contains one white and three cyan balls?), but this at least should describe how it fits together once you’ve answered those subproblems.
Very interesting! But I have to read up on the Appendix A4 I think to fully appreciate it...I will come back if I change my mind after it! :-)
My own, current, thoughts are like this: I would bet on the ball being white up to some ratio...if my bet was $1 and I could win $100 I would do it for instance. The probability is simply the border case where ratio between losing and winning is such that I might as well bet or not do it. Betting $50 I would certainly not do. So I would estimate the probability to be somewhere between 1 and 50%...and somewhere there is one and only one border case in between, but my human brain has difficulty thinking in such terms...
The same thing goes for the coin-flip, there is some ratio where it is rational to bet or not to.
Perhaps it might be wiser to use measures (distributions), or measures on spaces of measures, or iterate that construction indefinitely. (The concept of hyperpriors seems to go in this direction, for example.)
Consider the following propositions.
P1: The recently minted U.S. quarter I just vigorously flipped into the air landed heads on the floor.
P2: A ball pulled from an unspecified urn containing an unspecified number of balls is white.
P3(x): The probability of P2 is x
Part of the problem is the laxness in specifying the language, as I mentioned. For example, if the language we use is rich enough to support self-referring interpretations, then it may not even be possible to coherently assign a truth value—or any other probability, or to know whether that is possible.
But even ruling out Goedelian potholes in the landscape and uncountably infinite families of propositions, the contrast between P1 and P2 is problematic. P1 is backed up by a vast trove of background knowledge and evidence, and our confidence in asserting Prob(P1) = 1⁄2 is very strong. On the other hand, background knowledge and evidence about P2 is virtually nil. It is reasonable as a matter of customary usage to assume the number of balls in the urn is finite, and thus the probability of P1 is a rational number, but until you start adding in more assumptions and evidence, one’s confidence in Prob(P2) < x for any particular real number x seems typically to be very much lower than for P1. Summarizing one’s state of knowledge about these two propositions onto the same scale of reals between 0 and 1 seems to ignore an awful lot that we know about the relative state of knowledge vs. ignorance with respect to P1 and P2. An awful lot of knowledge is being jettisoned because it won’t fit into this scheme of definite real numbers. To make the claim Prob(P2) = 1⁄2 (or any other definite real number you want to name) just does not seem like the same kind of thing as the claim Prob(P1) = 1⁄2. It feels like a category mistake.
Jaynes addresses this to some degree in Appendix A4 “Comparative Probability”. He presents an argument that seems to go like this. It hardly matters very much what real number we use to start with for a statement without much background evidence, because the more evidence we accumulate, the more our assignments are coordinated with other statements into a comprehensive picture, and the probabilities eventually converge to true and correct values. That’s a heartening way to look at it, but it also goes to show that many of the assignments of specific real numbers we make, such as for P2 or P3, are largely irrelevancies that are right next door to meaningless. And in the end he reiterates his initial argument that the benefits of being able to have a real number to calculate with are irresistible. This comes at the price of helping ourselves to the illusion of more precision than our state of ignorance seems to entitle us to. This is why the axiom of comparability seems to me to make an unnatural correspondence to the way we could or should think about these things.
We’re getting ahead of the reading, but there’s a key distinction between the plausibility of a single proposition (i.e. a probability) and the plausibilities of a whole family of related plausibilities (i.e. a probability distribution).
Our state of knowledge about the coin is such that if we assessed probabilities for the class of propositions, “This coin has a bias X”, where X ranged from 0 (always heads) to 1 (always tails) we would find our prior distribution a sharp spike centered on 1⁄2. That, technically, is what we mean by “confidence”, and formally we will be using things like the variance of the distribution.
Ok, that sounds helpful. But then my question is this—if we have whole family of mutually exclusive propositions, with varying real numbers for plausibilities, about the plausibility of one particular proposition, then the assumption that that one proposition can have one specific real number as its plausibility is cast in doubt. I don’t yet see how we can have all those plausibility assignments in a coherent whole. But I’m happy to leave my question on the table if we’ll come to that part later.
If you have a mutually exclusive and exhaustive set of propositions Ai, each of which specifies a plausibility
) for the one proposition B you’re interested in, then your total plausilibity is =\sum_iP(B|A_i)P(A_i)). (Actually this is true whether or not the A’s say anything about B. But if they do, then this can be useful way to think about P(B).)I haven’t said how to assign plausibilities to the A’s (quick, what’s the plausibility that an unspecified urn contains one white and three cyan balls?), but this at least should describe how it fits together once you’ve answered those subproblems.
Very interesting! But I have to read up on the Appendix A4 I think to fully appreciate it...I will come back if I change my mind after it! :-)
My own, current, thoughts are like this: I would bet on the ball being white up to some ratio...if my bet was $1 and I could win $100 I would do it for instance. The probability is simply the border case where ratio between losing and winning is such that I might as well bet or not do it. Betting $50 I would certainly not do. So I would estimate the probability to be somewhere between 1 and 50%...and somewhere there is one and only one border case in between, but my human brain has difficulty thinking in such terms...
The same thing goes for the coin-flip, there is some ratio where it is rational to bet or not to.