I think it’s an important theorem, but if you want to talk about it you need to say what the theorem actually says in math, not try to badly paraphrase it in English and then claim it’s all the justification you ever need for the Bayesian approach.
The truth is that when you rigorously state the assumptions, they’re actually pretty strong, and this fact is dodged and evaded and ignored throughout Jayne’s treatment of the subject.
Differentiability is strong in a mathematical sense, but I’m not sure I want a system of reasoning about plausibility that doesn’t vary smoothly with smoothly varying evidence. I guess the answer is to actually look at such systems, but I don’t have the chops to follow Halperin (or this paper that claims to prove the theorem under very weak assumptions that exclude Halperin’s counterexample).
See my reply to AlanCrowe for a more precise statement of what I was asking.
It’s not just differentiability. Why use real numbers at all? Why does P(A&B|C) have to be a function of P(A|C) and P(B|A&C)? Jaynes tries to prevent the reader from even thinking about these questions. I’m not arguing against his conclusion, but his argument is incomplete and inadequate, and he tries to cover it up.
This paper formally states all of the assumptions necessary in the proof of Cox’s theorem (R1-R5 in the paper) and notes where the controversies are before going on with the proof. R5 is obviously not well supported and the major dispute over R1 is whether plausibilities must be universally comparable. (R1 and R5 correspond to your two major objections above, in order).
As requested below, a top level post would be very interesting
I’m working on a post on this topic, but I don’t think I can really adequately address what I don’t like about how Jayne’s presents the foundations of probability theory without presenting it myself the way I think it ought to be. And to do that I need to actually learn some things I don’t know yet, so it’s going to be a bit of a project.
In section 1.7 The basic desiderata, the decision to use real numbers is emphasised as one of three basic desiderata and tagged as equation 1.28. Jaynes devotes section 1.8 Comments, chewing over this point for a little more than a page, before punting the issue to Appendix A. He writes
These remarks are interjected to point out that there is a large unexplored area of possible generalizations and extensions of the theory to be developed here; perhaps this may inspire others to try their hand at developing ‘multidimensional theories’ of mental activity, which would more and more resemble the behaviour of actual human brains—not all of which is undesirable.Such a theory, if successful, might have an importance beyond our present ability to imagine.
Perhaps Jaynes is trying here to prevent the reader from even thinking about these questions, but if so his strategy is more bold and unconventional than I can fathom.
As for P(AB|C) = F[P(A|C),P(B|A&C)] that is equation 2.1. Jaynes considers an alternative in equ 2.2 and then discusses how to organize an exhaustive case split, before refer the reader interested in “Carrying out this somewhat tedious analysis” to Tribus
I confess that I have not worked through the 11 cases that Jaynes says need to be checked.
Notice though that in graduate level texts, dumping shit on the reader like this is standard practise. Jaynes is unusually helpful and complete for a text at this level. Compare it for example to Categories for the Working Mathematician. I like CftWM. MacLane takes pains to organise his material and to direct the reader’s attention to points requiring special care. Yet, following the conventions of the genre, he ends page 9 with
More explicity, given a metacategory of objects and arrows, its arrows, with the given composition, satisfy the “arrows-only” axioms; conversely, an arrows-only metacategory satisfies the objects-and-arrows axioms when the identity arrows, defined as above, are taken as the objects (Proof as exerecise)
Yes, Jaynes argument is incomplete, but by being more complete than is customary, even compared to works that are admired for their thoroughness and clarity, Jaynes has bloated his book to 727 pages. Criticising his omission of tedious case analysis is unfair.
Perhaps Jaynes is trying here to prevent the reader from even thinking about these questions, but if so his strategy is more bold and unconventional than I can fathom.
His strategy is to make them look like trivial details, things that can be safely assumed, things that only a pedantic mathematician could care about, things that don’t matter.
As for P(AB|C) = F[P(A|C),P(B|A&C)] that is equation 2.1. Jaynes considers an alternative in equ 2.2 and then discusses how to organize an exhaustive case split....
This part, in particular is what struck me as the most absolutely, monumentally awful part of the book. The other cases jaynes considers in his “exhaustive case split” are only a tiny, minuscule, arbitrary set of the things that P(AB|C) might depend on. Why should P(AB|C) not depend on the specific structure of the propositions themselves?
What bothers me so much about this part of the book isn’t so much that the argument is incomplete, but that Jaynes is downright deceptive in his attempts to convince the reader that it is a complete rigorous justification for the Bayesian approach. Jaynes (and Eliezer) make it sound like Cox proved a generic Dutch book argument against anyone who doesn’t use the Bayesian approach. There may indeed be such a theorem, but Cox’s theorem just isn’t it.
the other cases jaynes considers in his “exhaustive case split” are only a tiny, minuscule, arbitrary set of the things that P(AB|C) might depend on.
That’s a good point. I suspect that the oversight is due to the fact that the truth value of a conjunction of propositions depends only on the truth values of the constituent propositions, and not on any other structure they might have. I conjecture that the desideratum that propositions with the same truth value have the same plausibility could be used to demonstrate that P(AB|C) is not a function of any additional structure of the propositions, but Jaynes does not highlight the issue or perform any such demonstration.
I think it’s an important theorem, but if you want to talk about it you need to say what the theorem actually says in math, not try to badly paraphrase it in English and then claim it’s all the justification you ever need for the Bayesian approach.
The truth is that when you rigorously state the assumptions, they’re actually pretty strong, and this fact is dodged and evaded and ignored throughout Jayne’s treatment of the subject.
Differentiability is strong in a mathematical sense, but I’m not sure I want a system of reasoning about plausibility that doesn’t vary smoothly with smoothly varying evidence. I guess the answer is to actually look at such systems, but I don’t have the chops to follow Halperin (or this paper that claims to prove the theorem under very weak assumptions that exclude Halperin’s counterexample).
See my reply to AlanCrowe for a more precise statement of what I was asking.
It’s not just differentiability. Why use real numbers at all? Why does P(A&B|C) have to be a function of P(A|C) and P(B|A&C)? Jaynes tries to prevent the reader from even thinking about these questions. I’m not arguing against his conclusion, but his argument is incomplete and inadequate, and he tries to cover it up.
This paper formally states all of the assumptions necessary in the proof of Cox’s theorem (R1-R5 in the paper) and notes where the controversies are before going on with the proof. R5 is obviously not well supported and the major dispute over R1 is whether plausibilities must be universally comparable. (R1 and R5 correspond to your two major objections above, in order).
As requested below, a top level post would be very interesting
thanks! I haven’t seen that one before.
I’m working on a post on this topic, but I don’t think I can really adequately address what I don’t like about how Jayne’s presents the foundations of probability theory without presenting it myself the way I think it ought to be. And to do that I need to actually learn some things I don’t know yet, so it’s going to be a bit of a project.
In section 1.7 The basic desiderata, the decision to use real numbers is emphasised as one of three basic desiderata and tagged as equation 1.28. Jaynes devotes section 1.8 Comments, chewing over this point for a little more than a page, before punting the issue to Appendix A. He writes
Perhaps Jaynes is trying here to prevent the reader from even thinking about these questions, but if so his strategy is more bold and unconventional than I can fathom.
As for P(AB|C) = F[P(A|C),P(B|A&C)] that is equation 2.1. Jaynes considers an alternative in equ 2.2 and then discusses how to organize an exhaustive case split, before refer the reader interested in “Carrying out this somewhat tedious analysis” to Tribus I confess that I have not worked through the 11 cases that Jaynes says need to be checked.
Notice though that in graduate level texts, dumping shit on the reader like this is standard practise. Jaynes is unusually helpful and complete for a text at this level. Compare it for example to Categories for the Working Mathematician. I like CftWM. MacLane takes pains to organise his material and to direct the reader’s attention to points requiring special care. Yet, following the conventions of the genre, he ends page 9 with
Yes, Jaynes argument is incomplete, but by being more complete than is customary, even compared to works that are admired for their thoroughness and clarity, Jaynes has bloated his book to 727 pages. Criticising his omission of tedious case analysis is unfair.
His strategy is to make them look like trivial details, things that can be safely assumed, things that only a pedantic mathematician could care about, things that don’t matter.
This part, in particular is what struck me as the most absolutely, monumentally awful part of the book. The other cases jaynes considers in his “exhaustive case split” are only a tiny, minuscule, arbitrary set of the things that P(AB|C) might depend on. Why should P(AB|C) not depend on the specific structure of the propositions themselves?
What bothers me so much about this part of the book isn’t so much that the argument is incomplete, but that Jaynes is downright deceptive in his attempts to convince the reader that it is a complete rigorous justification for the Bayesian approach. Jaynes (and Eliezer) make it sound like Cox proved a generic Dutch book argument against anyone who doesn’t use the Bayesian approach. There may indeed be such a theorem, but Cox’s theorem just isn’t it.
I’d like to see this discussed as a top level post. Care to take a stab at it Smoofra?
That’s a good point. I suspect that the oversight is due to the fact that the truth value of a conjunction of propositions depends only on the truth values of the constituent propositions, and not on any other structure they might have. I conjecture that the desideratum that propositions with the same truth value have the same plausibility could be used to demonstrate that P(AB|C) is not a function of any additional structure of the propositions, but Jaynes does not highlight the issue or perform any such demonstration.