Book Club Update, Chapter 3 of Probability Theory
Previously: Book Club introductory post—Chapter 1 - Chapter 2
We will shortly move on to Chapter 3 (I have to post this today owing to vacation—see below). I have updated the previous post with a summary of chapter 2, with links to the discussion as appropriate. But first, a few announcements.
How to participate
This is both for people who have previously registered interest, as well as newcomers. This spreadsheet is our best attempt at coordinating 90+ Less Wrong readers interested in participating in “earnest study of the great literature in our area of interest”.
If you are still participating, please let the group know—all you have to do is fill in the “Active (Chapter)” column. Write in an “X” if you are checked out, or the number of the chapter you are currently reading. This will let us measure attrition, as well as adapt the pace if necessary. If you would like to join, please add yourself to the spreadsheet. If you would like to participate in live chat about the material, please indicate your time zone and preferred meeting time. As always, your feedback on the process itself is more than welcome.
Refer to the Chapter 1 post for more details on how to participate and meeting schedules.
Facilitator wanted
I’m taking off on vacation today until the end of the month. I’d appreciate if someone wanted to step into the facilitator’s shoes, as I will not be able to perform these duties in a timely manner for at least the next two weeks.
Chapter 3: Elementary Sampling Theory
Having derived the sum and product rules, Jaynes starts us in on a mainstay of probability theory, urn problems.
Readings for the week of 19/07: Sampling Without Replacement—Logic versus Propensity. Exercises: 3.1
Discussion starts here.
Uh. Hello? Is anybody out there?
Are people still interested in continuing? No one picked up the leadership torch as Morendil had asked, so I suppose he is well justified in not picking it up again himself. I don’t want to be a real “leader” here, but it makes sense to me to suggest the assignment of reading through section 3.7 and exercises through 3.5 (in the book) by, say, the 29th. Someone else can suggest the next chunk if they wish.
I would request that anyone who wants to proceed respond to this comment. My guess is that we only have a half-dozen or so left, but it would be nice to know for sure.
I think you need a top-level post to really reach everyone; comments scroll by too fast for the casual reader. One lessons-learned top-level post asking the original 90 participants “what happened for you” would be appropriate, I was intending to post it on coming back to collect suggestions for improvement.
You mentioned that you had better explanations for some ch 2 material, still planning to post?
Right now I’m leaning against. It is a bigger job than I want to attempt for my posting debut. Sorry. Maybe someday.
But the basic idea was mentioned in this comment and there are links to some follow-up material in some of the comments. It is not that big a deal, but it seems to me that everything becomes a little more intuitive when you are adding and subtracting “surprisals” rather than multiplying and dividing probabilities.
I’m interested in continuing. I was working on the exercises you list when the study group started. Since I’m looking at the same stuff as everyone else now, and because it’s a little tougher for me, I should be more active from here on out.
Well, since it looks like there are only two or three of us, why don’t we just give it up, and proceed on our own?
Great. But that only makes two of us. Is anyone else still out there?
On vacation still, but will pick up the topic again on my return next week, in some form or other. Even if the study group has died out (there was at least one prediction to that effect), I remain interested in probability for professional reasons, and I’ll be calling on LW to help with a series of articles I intend to write on Bayesian thinking applied to project planning and task estimation.
The standard derivation of the formula 3-18 in the PDF version is to create a sample space of all ways to draw n balls, count the number of ways to draw r red balls and n-r non-red balls, and divide the latter by the former, claiming each is equally likely.
Jaynes invokes identical combinatorics, but changes his language to speak of mutually-exclusive propositions and the principal of indifference instead of measuring a space.
How much of frequentist probability can be transformed into Bayesian by a simple change in language? Can this be formalized into a proof that they achieve the same results where applicable?
Isn’t the equivalence of the “superstructure” implicit in that both systems satisfy (and can be derived from) the Kolmogorov axioms (Section 2.6.4 of the book)?
Of course Jaynes claims in 2.6.4 that his version of Bayesianism goes beyond Kolmogorov (I’m guessing he is talking about things like the principle of indifference and MAXENT.)
Do both systems satisfy the Kolmogorov axioms? One of them is countable additivity, right?
Of course, Kolmogorov’s is hardly the only such development. My question is: Is there an isomorphism in reasoning that also serves as a proof of the equivalence?
Are you suggesting that Jaynes is only finitely additive? I have to admit that I don’t know exactly how Jaynes’s methodological preachments about taking the limit of finite set solutions translates into real math.
I’m not sure I understand your second paragraph either (I am only an amateur at math and less than amateur at analysis.) But my inclination is to say, “Yes, of course there is always a possible isomorphism in the reasonings upward from a shared collection of axioms. But no, there is not an isomorphism in the reasonings or justifications advanced in choosing that set of axioms. But I suspect I missed your point.
Incidentally, Appendix A-1 of the book includes much discussion, quite a bit of it over my head, of the relationship between Jaynes and Kolmogorov.
(Heh, I’m pretty sure being a college sophomore makes me an amateur too.)
Yep. Cox’s theorem implies only finite additivity. Jaynes makes a big point of this in many places.
I’m not asking for an isomorphism in the reasoning of choosing a set of axioms. I’m asking for an isomorphism in the reasoning in using them.
For large classes (all?) of problems with discrete probability spaces, this is trivial—just map a basis (in the topological sense) for the space onto mutually exclusive propositions. The combinatorics will be identical.
In Section 3.2, Jaynes talks a little about propensities, causality, and mixing forward and backward inference. Then, in a paragraph beginning with the words “More generally, consider …” he introduces the concept of “exchangeability”. I realize that he will discuss this idea in more depth later, but I have three questions that seem appropriate here.
Why does he bring this up here? I missed the connection to the paragraphs which precede this one. [Edit: Oh, I see. Previous paragraphs discussed sampling twice without replacement from a two ball urn—the simplest and most extreme case of exchangeability.]
I think I understand that exchangeability is a generalization of “independence”. That is, in sampling with replacement, all trials are both independent and exchangeable. But when you sample without replacement, trials are no longer independent, though they are still exchangeable. The question is, is there a simple example of a series of trials which is neither independent nor exchangeable?
Am I correct that there is no example of a sequence which is independent but not exchangeable?
Does anyone have a good intuitive explanation of the symmetry in the hypergeometric distribution where you get the same probability of a given number of Good balls drawn when you exchange the number of drawn balls and the number of Bad balls total in the urn? It’s obvious to me when either the number Drawn or Bad is 1, but not in general (other than the fact that you can transform the formulas and prove they’re equal).
Shouldn’t it be for exchanging the number of Drawn and Good balls? (In the wikipedia example, black is good and white is defective.) If the number of bad balls is 1 and the number drawn is 2, then the probability of getting 2 good balls when drawing 2 is high, but if you exchange drawn and bad, then the probability of getting 2 good balls becomes zero.
Drawn and Good are symmetric. I recommend a Venn diagram. There are two binary conditions: good vs bad and drawn vs left, both of which are effectively random. We are interested in the intersection, which is preserved by the symmetry.
What you said is true : you exchange the number of Drawn and Counted marbles.
However, the counted balls are white on wikipedia. (they are called defective indeed, so on wikipedia, people count the number of bad balls)
There was also a mistake in the part about symmetries, I replaced :
”Swapping the roles of black and drawn marbles” by :
”Swapping the roles of white and drawn marbles”
since m is the number of white marbles
Yes, that’s right. I misremembered.
I like your argument that all the balls in the urn are labeled good/bad and drawn/not, and that two processes are causally orthogonal, but it’s not so simple as each ball being independently randomly labeled. It’s more like: sample without replacement some number of balls and mark them Good. Then replace them all, and sample without replacement some number of balls and mark them Drawn. Naturally, I mean for a full random shuffle of the balls in the urn to occur before both samples are taken. And, as you observed, we’re asking about the distribution over the number of balls with the labels (Good,Drawn). Looking at it that way, I’m absolutely convinced. Thanks.
No time today to post questions on the first few sections of Chapter 3, so I’ll let anyone who cares to do that post an appropriate opener.