Book Club Update, Chapter 2 of Probability Theory

Previously: Book Club introductory post—First update and Chapter 1 summary

Discussion on chapter 1 has wound down, we move on to Chapter 2 (I have updated the previous post with a summary of chapter 1 with links to the discussion as appropriate). But first, a few announcements.

How to participate

This is both for people who have previously registered interest, as well as newcomers. This spreadsheet is our best attempt at coordinating 80+ Less Wrong readers interested in participating in “earnest study of the great literature in our area of interest”.

If you are still participating, please let the group know—all you have to do is fill in the “Active (Chapter)” column. Write in an “X” if you are checked out, or the number of the chapter you are currently reading. This will let us measure attrition, as well as adapt the pace if necessary. If you would like to join, please add yourself to the spreadsheet. If you would like to participate in live chat about the material, please indicate your time zone and preferred meeting time. As always, your feedback on the process itself is more than welcome.

Refer to the previous post for more details on how to participate and meeting schedules.

Chapter 2: The Quantitative Rules

In this chapter Jaynes carefully introduces and justifies the elementary laws of plausibility, from which all later results are derived.

(Disclosure: I wasn’t able to follow all the math in this chapter but I didn’t let it deter me; the applications in later chapters are more accessible. We’ll take things slow, and draw on such expertise as has been offered by more advanced members of the group. At worst this chapter can be enjoyed on a purely literary basis.)

Sections: The Product Rule—The Sum Rule. Exercises: 2.1 and 2.2

Chapter 2 works out the consequences of the qualitative desiderata introduced at the end of Chapter 1.

The first step is to consider the evaluation of the plausibility (AB|C), from the possibly relevant inputs: (B|C), (A|C), (A|BC) and (B|AC). Considerations of symmetry and the desideratum of consistency lead to a functional equation known as the “associativity equation”: F(F(x,z),z)=F(x,F(y,z)), characterizing the the function F such that (AB|C)=F[(B|C),(A|BC)]. The derivation that follows requires some calculus, and shows by differentiating then integrating back the form of the product rule:

w(AB|C)=w(A|BC)w(B|C)=w(B|AC)w(A|C)

Having obtained this, the next step is to establish how (A|B) is related to (not-A|B). The functional equation in this case is

x*S(S(y)/x)=y*S(S(x)/y)

and the derivation, after some more calculus, leads to S(x)=(1-x^m)^(1/m). But the value of m is irrelevant, and so we end up with the two following rules:

p(AB|C)=p(A|BC)p(B|C)=p(B|AC)p(A|C)

p(not-A|B)+p(A|B)=1

The exercises provide a first opportunity to explore how these two rules yield a great many other ways of assessing probabilities of more complex propositions, for instance p(C|A+B), based on the elementary probabilities.

Sections: Qualitative Properties—Numerical Values—Notation and Finite Sets Policy—Comments. Exercises: 2.3

Jaynes next turns back to the relation between “plausible reasoning” and deductive logic, showing the latter as a limiting case of the former. The weaker syllogisms shown in Chapter 1 correspond to inequalities that can be derived from the product rule, and the direction of these inequalities start to point to likelihood ratios.

The product and sum rules allow us to consider the particular case when we have a finite set of mutually exclusive and exhaustive propositions, and background information which is symmetrical about each such proposition: it says the same about any one of them that it says about any other. Considering two such situations, where the propositions are the same but the labels we give them are different, Jaynes shows that, given our starting desiderata, we cannot do other than to assign the same probabilities to propositions which we are unable to distinguish otherwise than by their labels.

This is the principle of indifference; its significance is that even though what we have derived so far is an infinity of functions p(x) generated by the parameter m, the desiderata entirely “pin down” the numerical values in this particular situation.

So far in this chapter we had been using p(x) as a function relating the plausibilities of propositions, such that p(x) was an arbitrary monotonic function of the plausibility x. At this point Jaynes suggests that we “turn this around” and say that x is a function of p. These values of p, probabilities, become the primary mathematical objects, while the plausibilities “have faded entirely out of the picture. We will just have no further use for them”.

The principle of indifference now allows us to start computing numerical values for “urn probabilities”, which will be the main topic of the next chapter.

Exercise 2.3 is notable for providing a formal treatment of the conjunction fallacy.

Chapter 2 ends with a cautionary note on the topic of justifying results on infinite sets only based on a “well-behaved” process of passing to the limit of a series of finite cases. The Comments section addresses the “subjective” vs “objective” distinction.