Can you think of further desiderata for plausible inference, or find issues with the one Jaynes lays out?
I find desideratum 1) to be poorly motivated, and a bit problematic. This is urged upon us in Chapter 1 mainly by considerations of convenience: a reasoning robot can’t calculate without numbers. But just because a calculator can’t calculate without numbers doesn’t seem a sufficient justification to assume those numbers exist, i.e., that a full and coherent mapping from statements to plausibilities exists. This doesn’t seem the kind of thing we can assume is possible, it’s the kind of thing we need to investigate to see if it’s possible.
This of course will depend on what class of statements we’ll allow into our language. I can see two ways forward on this: 1) we can assume that we have language of statements for which desideratum 1) is true. But then we need to understand what restrictions we’ve placed on the kinds of statements that can have numerical plausibilities. Or 2) We can pick a language that we want to use to talk about the world, and then investigate whether desideratum 1) can be satisfied by that language. I don’t see that this issue is touched on in Chapter 1.
There is further discussion of this in Appendix C; will this be discussed in connection with Chapter 1, or at some later time in the sequence? For example, in Appendix C, it turns out that desideratum 1 subdivides into two other axioms: transitivity, and universal comparability. The first one makes sense, but the second one doesn’t seem as compelling to me.
There is further discussion of this in Appendix C; will this be discussed in connection with Chapter 1, or at some later time in the sequence? For example, in Appendix C, it turns out that desideratum 1 subdivides into two other axioms: transitivity, and universal comparability. The first one makes sense, but the second one doesn’t seem as compelling to me.
It is indeed an extremely interesting question! Perhaps it would be wiser to use complex numbers for instance.
But intuitively it seems very likely that if you tell me two different propositions, that I can say either that one is more likely than the other, or that they are the same. Are there any special cases where one has to answer “the probabilities are uncomparable” that makes you doubt that it is so?
Perhaps it would be wiser to use complex numbers for instance.
Perhaps it might be wiser to use measures (distributions), or measures on spaces of measures, or iterate that construction indefinitely. (The concept of hyperpriors seems to go in this direction, for example.)
But intuitively it seems very likely that if you tell me two different propositions, that I can say either that one is more likely than the other, or that they are the same. Are there any special cases where one has to answer “the probabilities are uncomparable” that makes you doubt that it is so?
Consider the following propositions.
P1: The recently minted U.S. quarter I just vigorously flipped into the air landed heads on the floor.
P2: A ball pulled from an unspecified urn containing an unspecified number of balls is white.
P3(x): The probability of P2 is x
Part of the problem is the laxness in specifying the language, as I mentioned. For example, if the language we use is rich enough to support self-referring interpretations, then it may not even be possible to coherently assign a truth value—or any other probability, or to know whether that is possible.
But even ruling out Goedelian potholes in the landscape and uncountably infinite families of propositions, the contrast between P1 and P2 is problematic. P1 is backed up by a vast trove of background knowledge and evidence, and our confidence in asserting Prob(P1) = 1⁄2 is very strong. On the other hand, background knowledge and evidence about P2 is virtually nil. It is reasonable as a matter of customary usage to assume the number of balls in the urn is finite, and thus the probability of P1 is a rational number, but until you start adding in more assumptions and evidence, one’s confidence in Prob(P2) < x for any particular real number x seems typically to be very much lower than for P1. Summarizing one’s state of knowledge about these two propositions onto the same scale of reals between 0 and 1 seems to ignore an awful lot that we know about the relative state of knowledge vs. ignorance with respect to P1 and P2. An awful lot of knowledge is being jettisoned because it won’t fit into this scheme of definite real numbers. To make the claim Prob(P2) = 1⁄2 (or any other definite real number you want to name) just does not seem like the same kind of thing as the claim Prob(P1) = 1⁄2. It feels like a category mistake.
Jaynes addresses this to some degree in Appendix A4 “Comparative Probability”. He presents an argument that seems to go like this. It hardly matters very much what real number we use to start with for a statement without much background evidence, because the more evidence we accumulate, the more our assignments are coordinated with other statements into a comprehensive picture, and the probabilities eventually converge to true and correct values. That’s a heartening way to look at it, but it also goes to show that many of the assignments of specific real numbers we make, such as for P2 or P3, are largely irrelevancies that are right next door to meaningless. And in the end he reiterates his initial argument that the benefits of being able to have a real number to calculate with are irresistible. This comes at the price of helping ourselves to the illusion of more precision than our state of ignorance seems to entitle us to. This is why the axiom of comparability seems to me to make an unnatural correspondence to the way we could or should think about these things.
Summarizing one’s state of knowledge about these two propositions onto the same scale of reals between 0 and 1 seems to ignore an awful lot
We’re getting ahead of the reading, but there’s a key distinction between the plausibility of a single proposition (i.e. a probability) and the plausibilities of a whole family of related plausibilities (i.e. a probability distribution).
Our state of knowledge about the coin is such that if we assessed probabilities for the class of propositions, “This coin has a bias X”, where X ranged from 0 (always heads) to 1 (always tails) we would find our prior distribution a sharp spike centered on 1⁄2. That, technically, is what we mean by “confidence”, and formally we will be using things like the variance of the distribution.
We’re getting ahead of the reading, but there’s a key distinction between the plausibility of a single proposition (i.e. a probability) and the plausibilities of a whole family of related plausibilities (i.e. a probability distribution).
Ok, that sounds helpful. But then my question is this—if we have whole family of mutually exclusive propositions, with varying real numbers for plausibilities, about the plausibility of one particular proposition, then the assumption that that one proposition can have one specific real number as its plausibility is cast in doubt. I don’t yet see how we can have all those plausibility assignments in a coherent whole. But I’m happy to leave my question on the table if we’ll come to that part later.
If you have a mutually exclusive and exhaustive set of propositions Ai, each of which specifies a plausibility
) for the one proposition B you’re interested in, then your total plausilibity is =\sum_iP(B|A_i)P(A_i)). (Actually this is true whether or not the A’s say anything about B. But if they do, then this can be useful way to think about P(B).)
I haven’t said how to assign plausibilities to the A’s (quick, what’s the plausibility that an unspecified urn contains one white and three cyan balls?), but this at least should describe how it fits together once you’ve answered those subproblems.
Very interesting! But I have to read up on the Appendix A4 I think to fully appreciate it...I will come back if I change my mind after it! :-)
My own, current, thoughts are like this: I would bet on the ball being white up to some ratio...if my bet was $1 and I could win $100 I would do it for instance. The probability is simply the border case where ratio between losing and winning is such that I might as well bet or not do it. Betting $50 I would certainly not do. So I would estimate the probability to be somewhere between 1 and 50%...and somewhere there is one and only one border case in between, but my human brain has difficulty thinking in such terms...
The same thing goes for the coin-flip, there is some ratio where it is rational to bet or not to.
I find desideratum 1) to be poorly motivated, and a bit problematic. This is urged upon us in Chapter 1 mainly by considerations of convenience: a reasoning robot can’t calculate without numbers. But just because a calculator can’t calculate without numbers doesn’t seem a sufficient justification to assume those numbers exist, i.e., that a full and coherent mapping from statements to plausibilities exists. This doesn’t seem the kind of thing we can assume is possible, it’s the kind of thing we need to investigate to see if it’s possible.
This of course will depend on what class of statements we’ll allow into our language. I can see two ways forward on this: 1) we can assume that we have language of statements for which desideratum 1) is true. But then we need to understand what restrictions we’ve placed on the kinds of statements that can have numerical plausibilities. Or 2) We can pick a language that we want to use to talk about the world, and then investigate whether desideratum 1) can be satisfied by that language. I don’t see that this issue is touched on in Chapter 1.
There is further discussion of this in Appendix C; will this be discussed in connection with Chapter 1, or at some later time in the sequence? For example, in Appendix C, it turns out that desideratum 1 subdivides into two other axioms: transitivity, and universal comparability. The first one makes sense, but the second one doesn’t seem as compelling to me.
It is indeed an extremely interesting question! Perhaps it would be wiser to use complex numbers for instance.
But intuitively it seems very likely that if you tell me two different propositions, that I can say either that one is more likely than the other, or that they are the same. Are there any special cases where one has to answer “the probabilities are uncomparable” that makes you doubt that it is so?
Perhaps it might be wiser to use measures (distributions), or measures on spaces of measures, or iterate that construction indefinitely. (The concept of hyperpriors seems to go in this direction, for example.)
Consider the following propositions.
P1: The recently minted U.S. quarter I just vigorously flipped into the air landed heads on the floor.
P2: A ball pulled from an unspecified urn containing an unspecified number of balls is white.
P3(x): The probability of P2 is x
Part of the problem is the laxness in specifying the language, as I mentioned. For example, if the language we use is rich enough to support self-referring interpretations, then it may not even be possible to coherently assign a truth value—or any other probability, or to know whether that is possible.
But even ruling out Goedelian potholes in the landscape and uncountably infinite families of propositions, the contrast between P1 and P2 is problematic. P1 is backed up by a vast trove of background knowledge and evidence, and our confidence in asserting Prob(P1) = 1⁄2 is very strong. On the other hand, background knowledge and evidence about P2 is virtually nil. It is reasonable as a matter of customary usage to assume the number of balls in the urn is finite, and thus the probability of P1 is a rational number, but until you start adding in more assumptions and evidence, one’s confidence in Prob(P2) < x for any particular real number x seems typically to be very much lower than for P1. Summarizing one’s state of knowledge about these two propositions onto the same scale of reals between 0 and 1 seems to ignore an awful lot that we know about the relative state of knowledge vs. ignorance with respect to P1 and P2. An awful lot of knowledge is being jettisoned because it won’t fit into this scheme of definite real numbers. To make the claim Prob(P2) = 1⁄2 (or any other definite real number you want to name) just does not seem like the same kind of thing as the claim Prob(P1) = 1⁄2. It feels like a category mistake.
Jaynes addresses this to some degree in Appendix A4 “Comparative Probability”. He presents an argument that seems to go like this. It hardly matters very much what real number we use to start with for a statement without much background evidence, because the more evidence we accumulate, the more our assignments are coordinated with other statements into a comprehensive picture, and the probabilities eventually converge to true and correct values. That’s a heartening way to look at it, but it also goes to show that many of the assignments of specific real numbers we make, such as for P2 or P3, are largely irrelevancies that are right next door to meaningless. And in the end he reiterates his initial argument that the benefits of being able to have a real number to calculate with are irresistible. This comes at the price of helping ourselves to the illusion of more precision than our state of ignorance seems to entitle us to. This is why the axiom of comparability seems to me to make an unnatural correspondence to the way we could or should think about these things.
We’re getting ahead of the reading, but there’s a key distinction between the plausibility of a single proposition (i.e. a probability) and the plausibilities of a whole family of related plausibilities (i.e. a probability distribution).
Our state of knowledge about the coin is such that if we assessed probabilities for the class of propositions, “This coin has a bias X”, where X ranged from 0 (always heads) to 1 (always tails) we would find our prior distribution a sharp spike centered on 1⁄2. That, technically, is what we mean by “confidence”, and formally we will be using things like the variance of the distribution.
Ok, that sounds helpful. But then my question is this—if we have whole family of mutually exclusive propositions, with varying real numbers for plausibilities, about the plausibility of one particular proposition, then the assumption that that one proposition can have one specific real number as its plausibility is cast in doubt. I don’t yet see how we can have all those plausibility assignments in a coherent whole. But I’m happy to leave my question on the table if we’ll come to that part later.
If you have a mutually exclusive and exhaustive set of propositions Ai, each of which specifies a plausibility
) for the one proposition B you’re interested in, then your total plausilibity is =\sum_iP(B|A_i)P(A_i)). (Actually this is true whether or not the A’s say anything about B. But if they do, then this can be useful way to think about P(B).)I haven’t said how to assign plausibilities to the A’s (quick, what’s the plausibility that an unspecified urn contains one white and three cyan balls?), but this at least should describe how it fits together once you’ve answered those subproblems.
Very interesting! But I have to read up on the Appendix A4 I think to fully appreciate it...I will come back if I change my mind after it! :-)
My own, current, thoughts are like this: I would bet on the ball being white up to some ratio...if my bet was $1 and I could win $100 I would do it for instance. The probability is simply the border case where ratio between losing and winning is such that I might as well bet or not do it. Betting $50 I would certainly not do. So I would estimate the probability to be somewhere between 1 and 50%...and somewhere there is one and only one border case in between, but my human brain has difficulty thinking in such terms...
The same thing goes for the coin-flip, there is some ratio where it is rational to bet or not to.
B°„bd‹¨È2«Î,¼·îðe8&”¯¯åØûÑ‹¥Õ»ãæLŒß¿~—ãmMvñ $0˜ÅÚ‡íxf½wœçYÍØ9çG’•ÿñ8ʱ‘|x‡P z‰ :kb\ȃiÉû2ÔA2i‘Õ„Ó4‘·DÅ™™ aèá;ºyÖ´òdÄPX‡å²ï<ã§[µaŠ¡îbˆ˜æ‰èbaÅÞï_,¶e©U9ê,H^»þ*¾