Probability Space & Aumann Agreement
The first part of this post describes a way of interpreting the basic mathematics of Bayesianism. Eliezer already presented one such view at http://lesswrong.com/lw/hk/priors_as_mathematical_objects/, but I want to present another one that has been useful to me, and also show how this view is related to the standard formalism of probability theory and Bayesian updating, namely the probability space.
The second part of this post will build upon the first, and try to explain the math behind Aumann’s agreement theorem. Hal Finney had suggested this earlier, and I’m taking on the task now because I recently went through the exercise of learning it, and could use a check of my understanding. The last part will give some of my current thoughts on Aumann agreement.
Probability Space
In http://en.wikipedia.org/wiki/Probability_space, you can see that a probability space consists of a triple:
Ω – a non-empty set – usually called sample space, or set of states
F – a set of subsets of Ω – usually called sigma-algebra, or set of events
P – a function from F to [0,1] – usually called probability measure
F and P are required to have certain additional properties, but I’ll ignore them for now. To start with, we’ll interpret Ω as a set of possible world-histories. (To eliminate anthropic reasoning issues, let’s assume that each possible world-history contains the same number of observers, who have perfect memory, and are labeled with unique serial numbers.) Each “event” A in F is formally a subset of Ω, and interpreted as either an actual event that occurs in every world-history in A, or a hypothesis which is true in the world-histories in A. (The details of the events or hypotheses themselves are abstracted away here.)
To understand the probability measure P, it’s easier to first introduce the probability mass function p, which assigns a probability to each element of Ω, with the probabilities summing to 1. Then P(A) is just the sum of the probabilities of the elements in A. (For simplicity, I’m assuming the discrete case, where Ω is at most countable.) In other words, the probability of an observation is the sum of the probabilities of the world-histories that it doesn’t rule out.
A payoff of this view of the probability space is a simple understanding of what Bayesian updating is. Once an observer sees an event D, he can rule out all possible world-histories that are not in D. So, he can get a posterior probability measure by setting the probability masses of all world-histories not in D to 0, and renormalizing the ones in D so that they sum up to 1 while keeping the same relative ratios. You can easily verify that this is equivalent to Bayes’ rule: P(H|D) = P(D ∩ H)/P(D).
To sum up, the mathematical objects behind Bayesianism can be seen as
Ω – a set of possible world-histories
F – information about which events occur in which possible world-histories
P – a set of weights on the world-histories that sum up to 1
Aumann’s Agreement Theorem
Aumann’s agreement theorem says that if two Bayesians share the same probability space but possibly different information partitions, and have common knowledge of their information partitions and posterior probabilities of some event A, then their posterior probabilities of that event must be equal. So what are information partitions, and what does “common knowledge” mean?
The information partition I of an observer-moment M divides Ω into a number of subsets that are non-overlapping, and together cover all of Ω. Two possible world-histories w1 and w2 are placed into the same subset if the observer-moments in w1 and w2 have the exact same information. In other words, if w1 and w2 are in the same element of I, and w1 is the actual world-history, then M can’t rule out either w1 or w2. I(w) is used to denote the element of I that contains w.
Common knowledge is defined as follows: If w is the actual world-history and two agents have information partitions I and J, an event E is common knowledge if E includes the member of the meet I∧J that contains w. The operation ∧ (meet) means to take the two sets I and J, form their union, then repeatedly merge any of its elements (which you recall are subsets of Ω) that overlap until it becomes a partition again (i.e., no two elements overlap).
It may not be clear at first what this meet operation has to do with common knowledge. Suppose the actual world-history is w. Then agent 1 knows I(w), so he knows that agent 2 must know one of the elements of J that overlaps with I(w). And he can reason that agent 2 must know that agent 1 knows one of the elements of I that overlaps with one of these elements of J. If he carries out this inference to infinity, he’ll find that both agents know that the actual world-history is in (I∧J)(w), and both know the other know, and both know the other know the other know, and so on. In other words it is common knowledge that the actual world-history is in (I∧J)(w). Since event E occurs in every world-history in (I∧J)(w), it’s common knowledge that E occurs in the actual world-history.
Proof for the agreement theorem then goes like this. Let E be the event that agent 1 assigns a posterior probability (conditioned on everything it knows) of q1 to event A and agent 2 assigns a posterior probability of q2 to event A. If E is common knowledge at w, then both agents know that P(A | I(v)) = q1 and P(A | J(v)) = q2 for every v in (I∧J)(w). But this implies P(A | (I∧J)(w)) = q1 and P(A | (I∧J)(w)) = q2 and therefore q1 = q2. (To see this, suppose you currently know only (I∧J)(w), and you know that no matter what additional information I(v) you obtain, your posterior probability will be the same q1, then your current probability must already be q1.)
Is Aumann Agreement Overrated?
Having explained all of that, it seems to me that this theorem is less relevant to a practical rationalist than I thought before I really understood it. After looking at the math, it’s apparent that “common knowledge” is a much stricter requirement than it sounds. The most obvious way to achieve it is for the two agents to simply tell each other I(w) and J(w), after which they share a new, common information partition. But in that case, agreement itself is obvious and there is no need to learn or understand Aumann’s theorem.
There are some papers that describe ways to achieve agreement in other ways, such as iterative exchange of posterior probabilities. But in such methods, the agents aren’t just moving closer to each other’s beliefs. Rather, they go through convoluted chains of deduction to infer what information the other agent must have observed, given his declarations, and then update on that new information. (The process is similar to the one needed to solve the second riddle on this page.) The two agents essentially still have to communicate I(w) and J(w) to each other, except they do so by exchanging posterior probabilities and making logical inferences from them.
Is this realistic for human rationalist wannabes? It seems wildly implausible to me that two humans can communicate all of the information they have that is relevant to the truth of some statement just by repeatedly exchanging degrees of belief about it, except in very simple situations. You need to know the other agent’s information partition exactly in order to narrow down which element of the information partition he is in from his probability declaration, and he needs to know that you know so that he can deduce what inference you’re making, in order to continue to the next step, and so on. One error in this process and the whole thing falls apart. It seems much easier to just tell each other what information the two of you have directly.
Finally, I now see that until the exchange of information completes and common knowledge/agreement is actually achieved, it’s rational for even honest truth-seekers who share common priors to disagree. Therefore, two such rationalists may persistently disagree just because the amount of information they would have to exchange in order to reach agreement is too great to be practical. This is quite different from the understanding of Aumann agreement I had before I read the math.
- What are the open problems in Human Rationality? by 13 Jan 2019 4:46 UTC; 94 points) (
- 1 Mar 2014 9:21 UTC; 61 points) 's comment on Self-Congratulatory Rationalism by (
- What Are Probabilities, Anyway? by 11 Dec 2009 0:25 UTC; 49 points) (
- 19 Aug 2019 4:02 UTC; 33 points) 's comment on Realism about rationality by (
- An explanation of Aumann’s agreement theorem by 7 Jul 2011 6:22 UTC; 13 points) (
- 30 Sep 2023 1:35 UTC; 8 points) 's comment on Aumann-agreement is common by (
- 8 Jul 2011 6:31 UTC; 5 points) 's comment on An explanation of Aumann’s agreement theorem by (
- 25 Feb 2020 10:00 UTC; 5 points) 's comment on Time Binders by (
- 26 May 2018 0:44 UTC; 4 points) 's comment on Confusions Concerning Pre-Rationality by (
- 1 Sep 2010 23:47 UTC; 3 points) 's comment on Less Wrong: Open Thread, September 2010 by (
- 1 Nov 2011 0:59 UTC; 3 points) 's comment on The Pleasures of Rationality by (
- 25 Oct 2019 5:05 UTC; 2 points) 's comment on Open & Welcome Thread—October 2019 by (
- An explanation of Aumann’s agreement theorem by 7 Jul 2011 6:14 UTC; 2 points) (
- 11 Dec 2009 15:41 UTC; 1 point) 's comment on Probability Space & Aumann Agreement by (
- 3 Nov 2011 7:33 UTC; 0 points) 's comment on The Pleasures of Rationality by (
I think there’s another, more fundamental reason why Aumann agreement doesn’t matter in practice. It requires each party to assume the other is completely rational and honest.
Acting as if the other party is rational is good for promoting calm and reasonable discussion. Seriously considering the possibility that the other party is rational is certainly valuable. But assuming that the other party is in fact totally rational is just silly. We know we’re talking to other flawed human beings, and either or both of us might just be totally off base, even if we’re hanging around on a rationality discussion board.
I believe Hanson’s paper on ‘Bayesian wannabes’ shows that even only partially rational agents must agree about a lot.
Jaw-droppingly (for me), that paper apparently uses “Bayesians” to refer to agents whose primary goal involves seeking (and sharing) the truth.
IMO, “Bayesians” should refer to agents that employ Bayesian statistics, regardless of what their goals are.
That Hanson casually employs this other definition without discussing the issue or defending his usage says a lot about his attitude to the subject.
I assume this just means that their primary epistemic goal is such, not that this is their utility function.
That’s why I used the word “involves”.
However, surely there are possible agents who are major fans of Bayesian statistics who don’t have the time or motive to share their knowledge with other agents. Indeed, they may actively spread disinformation to other agents in order to manipulate them. Those folk are not bound to agree with other agents when they meet them.
Won’t the utility function eventually update to match?
Maybe I lack imagination—is it possible for a strict Bayesian to do anything but seek and share the truth (assuming he is interacting with other Bayesians)?
Bayes rule is about how to update your estimates of the probability of hypotheses on the basis of incoming data. It has nothing to say about an agent’s goal, or how it behaves. Agents can employ Bayesian statistics to update their world view while pursuing literally any goal.
If you think the term “Bayesian” implies an agent whose goal necessarily involves spreading truth to other agents, I have to ask for your references for that idea.
I am looking at the world around me, at the definition of Bayesian, and assuming the process has been going on in an agent for long enough for it to be properly called “a Bayesian agent”, and think to myself—the agent space I end up in, has certain properties.
Of course, I’m using the phrase “Bayesian agent” to mean something slightly different than what the original poster intended.
Of course the agent space you end up in, has certain properties—but the issue is whether those properties necessarily involve sharing the truth with others.
I figure you can pursue any goal using Bayesian statistics—including goals that include attempting to deceive and mislead others.
For example, a Bayesian public relations officer for big tobacco would not be bound to agree with other agents that she met.
You’re speaking of Bayesian agents as a general term to refer to anyone who happens to use Bayesian statistics for a specific purpose—and in that context, I agree with you. In that context, your statements are correct, by definition.
I am speaking of Bayesian agents using the idealized, Hollywood concept of agent. Maybe I should have been more specific and referred to super-agents, equivalent to super-spies.
I claim that someone who has lived and breathed the Bayes way will be significantly different than someone who has applied it, even very consistently, within a limited domain. For example, I can imagine a Bayesian super-agent working for big tobacco, but I see the probability of that event actually coming to pass as too small to be worth considering.
I don’t really know what you mean. A “super agent”? Do you really think Bayesian agents are “good”?
Since you haven’t really said what you mean, what do you mean? What are these “super agents” of which you speak? Would you know one if you met one?
Super-agent. You know, like James Bond, Mr. and Ms. Smith. Closer to the use, in context—Jeffreyssai.
Right… So: how about Lex Luthor or General Zod?
I’ve seen the paper, but it assumes the point in question in the definition of partially rational agents in the very first paragraph:
But peoples’ estimates generally aren’t consistent with his constraints, so even for someone who is sufficiently rational, it doesn’t make any sense whatsoever to assume that everyone else is.
This doesn’t mean Robin’s paper is wrong. It just means that faced with a topic where we would “agree to disagree”, you can either update your belief about the topic, or update your belief about whether both of us are rational enough for the proof to apply.
Assuming honesty is pretty problematical, too. In real-world disputes, participants are likely to disagree about what constitutes evidence (“the Bible says..”), aren’t rational, and suspect each others honesty.
Sure all by itself this first paper doesn’t seem very relevant for real disagreements, but there is a whole literature beyond this first paper, which weakens the assumptions required for similar results. Keep reading.
I already scanned through some of the papers that cite Aumann, but didn’t find anything that made me change my mind. Do you have any specific suggestions on what I should read?
Uh oh, it looks like you guys are doing the Aumann “meet” operation to update your beliefs about Aumann. Make sure to keep track of the levels of recursion...
Seen Hanson’s own http://hanson.gmu.edu/deceive.pdf—and its references?
Yes, I looked at that paper, and also Agreeing To Disagree: A Survey by Giacomo Bonanno and Klaus Nehring.
How about Scott Aaronson:
http://www.scottaaronson.com/papers/agree-econ.pdf
He shows that you do not have to exchange very much information to come to agreement. Now maybe this does not address the question of the potential intractability of the deductions to reach agreement (the wannabe papers may do this) but I think it shows that it is not necessary to exchange all relevant information.
The bottom line for me is the flavor of the Aumann theorem: that there must be a reason why the other person is being so stubborn as not to be convinced by your own tenacity. I think this insight is the key to the whole conclusion and it is totally overlooked by most disagreers.
I haven’t read the whole paper yet, but here’s one quote from it (page 5):
Scott is talking about the computational complexity of his agreement protocol here. Even if we can improve the complexity to something that is considered practical from a computer science perspective, that will still likely be impractical for human beings, most of whom can’t even multiply 3 digit numbers in their heads.
To quote from the abstract of Scott Aaronson’s paper:
“A celebrated 1976 theorem of Aumann asserts that honest, rational Bayesian agents with common priors will never agree to disagree”: if their opinions about any topic are common knowledge, then those opinions must be equal.”
Even “honest, rational, Bayesian agents” seems too weak. Goal-directed agents who are forced to signal their opinions to others can benefit from voluntarily deceiving themselves in order to effectively deceive others. Their self-deception makes their opinions more credible—since they honestly believe them.
If an agent honestly believes what they are saying, it is difficult to accuse them of dishonesty—and such an agent’s understanding of Bayesian probability theory may be immaculate.
Such agents are not constrained to agree by Aumann’s disagreement theorem.
This seems to reflect human cognitive architecture more than a general fact about optimal agents or even most/all goal-directed agents. That humans are not optimal is nothing new around here, nor that the agreement theorems have little relevance to real human arguments. (I can’t be the only one to read the papers and think, ‘hell, I don’t trust myself as far as even the weakened models, much less Creationists and whatnot’, and have little use for them.)
The reason is often that you regard your own perceptions and conclusion as trustworthy and in accordance with your own aims—whereas you don’t have a very good reason to believe the other person is operating in your interests (rather than selfishly trying to manipulate you to serve their own interests). They may reason in much the same way.
Probably much the same circuitry continues to operate even in those very rare cases where two truth-seekers meet, and convince each other of their sincerity.
One question on your objections: how would you characterize the state of two human rationalist wannabes who have failed to reach agreement? Would you say that their disagreement is common knowledge, or instead are they uncertain if they have a disagreement?
ISTM that people usually find themselves rather certain that they are in disagreement and that this is common knowledge. Aumann’s theorem seems to forbid this even if we assume that the calculations are intractable.
The rational way to characterize the situation, if in fact intractability is a practical objection, would be that each party says he is unsure of what his opinion should be, because the information is too complex for him to make a decision. If circumstances force him to adopt a belief to act on, maybe it is rational for the two to choose different actions, but they should admit that they do not really have good grounds to assume that their choice is better than the other person’s. Hence they really are not certain that they are in disagreement, in accordance with the theorem. Again this is in striking contrast to actual human behavior even among wannabes.
I would say that one possibility is that their disagreement is common knowledge, but they don’t know how to reach agreement. From what I’ve learned so far, disagreements between rationalist wannabes can arise from 3 sources:
different priors
different computational shortcuts/approximations/errors
incomplete exchange of information
Even if the two rationalist wannabes agree that in principle they should have the same priors and the same computations, and full exchange of information, as of today they do not have general methods to solve any of these problems, can only try to work out their differences on a case-by-case basis, with high likelihood that they’ll have to give up at some point before they reach agreement.
Your suggestion of what rationalist wannabes should do intuitively makes a lot of sense to me. But perhaps one reason people don’t do it is because they don’t know that it is what they should do? I don’t recall a post here or on OB that argued for this position, for example.
You mean “common knowledge” in the technical sense described in the post?
If so, your questions do not appear to make sense.
Why not? They both know they disagree, they both know they both know they disagree, etc… Perhaps Agent 1 doesn’t know 2′s partitioning, or vice versa. Or perhaps their partitionings are common knowledge, but they lack the (computational ability) to actually determine the meet, for example, no?
Wei was hypothesising disagreement due to an incomplete exchange of information. In which case, the parties both know that they disagree, but don’t have the time/energy/resources to sort each other’s opinions out. Then Aumann’s idea doesn’t really apply.
Aaah, okay. Though presumably at least one would know the probabilities that both assigned (and said “I disagree”...) that is, it would generally take a bit of a contrived situation for them to know they disagree, but neither to know anything about the other’s probability other than that it’s different.
(What happens if the successfully exchange probabilities, have unbounded computing power, they have shared common knowledge priors… But they don’t know each other’s partitioning… Or would the latter automatically be computed from the rest?)
Just one round of comparing probabilities is not normally enough for the parties involved to reach agreement, though.
Well, if they do know each other’s partitions and are computationally unbounded, then they would reach agreement after one step, wouldn’t they? (or did I misunderstand the theorem?)
Or do you mean If they don’t know each other’s partitions, iterative exchange of updated probabilities effectively transmits the needed information?
Should people really adopt the “common knowledge” terminology? Surely that terminology is highly misleading and is responsible for many misunderstandings.
If people take common English words and give them an esoteric technical meaning that differs dramatically from a literal reading, then shouldn’t they at least capitalise them?
I too found my understanding changed dramatically when I looked into Aumann’s original paper. Basically, the result has a misleading billing—and those citing the result rarely seemed to bother explaining much about the actual result or its significance.
I also found myself wondering why people remained puzzled about the high observed levels of disagreement. It seems obvious to me that people are poor approximations of truth-seeking agents—and instead promote their own interests. If you understand that, then the existence of many real-world disagreements is explained: people disagree in order to manipulate the opinions and actions of others for their own benefit.
Sorry, I think I got a bit confused about the “meet” operation, mind clarifying?
is (I^J)(w) equal to the intersection of I(w) and J(w) (which seems to be the implied way it works based on the overall description here) or something else? (Since the definition of meet you gave involved unions rather than intersections, and some sort of merging operation)
Thanks.
EDIT: whoops. am stupid today. Meant to say intersection, not disjunction
Meet of two partitions (in the context of this post) is the finest common coarsening of those partitions.
Consider the coarsening relation on the set of all partitions of the given set. Partition A is a coarsening of partition B if A can be obtained by “lumping together” some of the elements of B. Now, for this order, a “meet” of two partitions X and Y is a partition Z such that
Z is a coarsening of X, and it is a coarsening of Y
Z is the finest such partition, that is for any other Z’ that is a coarsening of both X and Y, Z’ is also a coarsening of Z.
Under the usages familiar to me, the common coarsening is the join, not the meet. That’s how “join” is used on the Wikipedia page for set partitions. Using “meet” to mean “common refinement” is the usage that makes sense to me in the context of the proof in the OP. [ETA: I’ve been corrected on this point; see below.]
Of course, what you call “meet” or “join” depends on which way you decide to direct the partial order on partitions. Unfortunately, it looks like both possibilities are floating around as conventions.
See for example on Wikipedia: Common knowledge (logic)
The idea is that the partitions define what each agent is able to discern, so no refinement of what a given agent can discern is possible (unless you perform additional communication). Aumann’s agreement theorem is about a condition for when the agents already agree, without any additional discussion between them.
Hmm. Then I am in a state of confusion much like Psy-Kosh’s. These opposing convention aren’t helping, but, at any rate, I evidently need to study this more closely.
It was confusing for me too, which is why I gave an imperative definition: first form the union of I and J, then merge any overlapping elements. Did that not help?
It should have. The fault is certainly mine. I skimmed your definition too lightly because you were defining a technical term (“meet”) in a context (partitions) where I was already familiar with the term, but I hadn’t suspected that it had any other usages than the one I knew.
The term “meet” would correspond to considering a coarser partition as “less” than a finer partition, which is natural enough if you see partitions as representing “precision of knowledge”. The coarser partition is able to discern less. Greatest lower bound is usually called “meet”.
It’s always called that, but the greatest lower bound and the least upper bound switch places if you switch the direction of the partial order. And there’s a lot of literature on set partitions in which finer partitions are lower in the poset. (That’s the convention used in the Wikipedia page on set partitions.)
The justification for taking the meet to be a refinement is that refinements correspond to intersections of partition elements, and intersections are meets in the poset of sets. So the terminology carries over from the poset of sets to the poset of set partitions in a way that appeals to the mathematician’s aesthetic.
But I can see the justification for the opposite convention when you’re talking about precision of knowledge.
Ah, thanks. In that case… wouldn’t the meet of A and B often end up being the entire space?
For that matter, why this coarsening operation rather than the set of all the possible pairwise intersections between members of I and members of J?
ie, why coarsening instead if “fineing” (what’s the appropriate word there anyways?)
When two rationalists exchange information, shouldn’t their conclusions then sometimes be finer rather than coarser since they have, well, each gained information they didn’t have previously?
If I’ve got this right...
When two rationalists exchange all information, their new partition is the ‘join’ of the two old partitions, where the join is the “coarsest common fining”. If you plot omega as the rectangle with corners at (-1,-1) and (1,1) and the initial partitions are the x axis for agent A and the Y axis for agent B, then they share information and ‘join’ and then their common partition separates all 4 quadrants.
“common knowledge” is the set of questions that they can both answer before sharing information. This is the ‘meet’ which is the coarsest common fining. In the previous example, there is no information that they both share, so the meet becomes the whole quadrant.
If you extend omega down to y = −2 and modify the original partitions to both fence off this new piece on its own, then the join would be the original four squares plus this lower rectangle, while the meet would be the square from (-1,1) to (1,1) plus this lower rectangle (since they now have this as common knowledge).
Does this help?
wait, what? is it coarsest common fining or finest common coarsening that we’re interested in here?
And isn’t common knowledge the set of questions that not only they can both answer, but that they both know that both can answer, and both know that both know, etc etc etc?
Actually, maybe I need to reread this a bit more, but now am more confused.
Actually, on rereading, I think I’m starting to get the idea about meet and common knowledge (given that before exchanging info, they do know each other’s partitioning, but not which particular partition the other has observed to be the current one).
Thanks!
Nope; it’s the limit of I(J(I(J(I(J(I(J(...(w)...), where I(S) for a set S is the union of the elements of I that have nonempty intersections with S, i.e. the union of I(x) over all x in S, and J(S) is defined the same way.
Alternately if instead of I and J you think about the sigma-algebras they generate (let’s call them sigma(I) and sigma(J)), then sigma(I meet J) is the intersection of sigma(I) and sigma(J). I prefer this somewhat because the machinery for conditional expectation is usually defined in terms of sigma-algebras, not partitions.
Then… I’m having trouble seeing why I^J wouldn’t very often converge on the entire space.
ie, suppose a super simplification in which both agent 1 and agent 2 partition the space into only two parts, agent 1 partitioning it into I = {A1, B1}, and agent 2 partitioning into J = {A2, B2}
Suppose I(w) = A1 and J(w) = A2
Then, unless the two partitions are identical, wouldn’t (I^J)(w) = the entire space? or am I completely misreading? And thanks for taking the time to explain.
That simplification is a situation in which there is no common knowledge. In world-state w, agent 1 knows A1 (meaning knows that the correct world is in A1), and agent 2 knows A2. They both know A1 union A2, but that’s still not common knowledge, because agent 1 doesn’t know that agent 2 knows A1 union A2.
I(w) is what agent 1 knows, if w is correct. If all you know is S, then the only thing you know agent 1 knows is I(S), and the only thing that you know agent 1 knows agent 2 knows is J(I(S)), and so forth. This is why the usual “everyone knows that everyone knows that … ” definition of common knowledge translates to I(J(I(J(I(J(...(w)...).
Well, how is it not the intersection then?
ie, Agent 1 knows A1 and knows that Agent 2 knows A2
If they trust each other’s rationality, then they both know that w must be in A1 and be in A2
So they both conclude it must be in intersection of A1 and A2, and they both know that they both know this, etc etc...
Or am I missing the point?
As far as I understand, agent 1 doesn’t know that agent 2 knows A2, and agent 2 doesn’t know that agent 1 knows A1. Instead, agent 1 knows that agent 2′s state of knowledge is in J and agent 2 knows that agent 1′s state of knowledge is in I. I’m a bit confused now about how this matches up with the meaning of Aumann’s Theorem. Why are I and J common knowledge, and {P(A|I)=q} and {P(A|J)=q} common knowledge, but I(w) and J(w) are not common knowledge? Perhaps that’s what the theorem requires, but currently I’m finding it hard to see how I and J being common knowledge is reasonable.
Edit: I’m silly. I and J don’t need to be common knowledge at all. It’s not agent 1 and agent 2 who perform the reasoning about I meet J, it’s us. We know that the true common knowledge is a set from I meet J, and that therefore if it’s common knowledge that agent 1′s posterior for the event A is q1 and agent 2′s posterior for A is q2, then q1=q2. And it’s not unreasonable for these posteriors to become common knowledge without I(w) and J(w) becoming common knowledge. The theorem says that if you’re both perfect Bayesians and you have the same priors then you don’t have to communicate your evidence.
But if I and J are not common knowledge then I’m confused about why any event that is common knowledge must be built from the meet of I and J.
Then agent 1 knows that agent 2 knows one of the members of J that have non empty intersection with I(w), and similar for for agent 2.
Presumably they have to tell each other which of their own partitions w is in, right? ie, presumably SOME sort of information sharing happens about each other’s conclusions.
And, once that happens, seems like intersection I(w) and J(w) would be their resultant common knowledge.
I’m confused still though what the “meet” operation is.
Unless… the idea is something like this: they exchange probabilities. Then agent 1 reasons “J(w) is a member of J such that it both Intersects I(w) AND would assign that particular probability. So then I can determine the subset of I(w) that intersects with those” and determine a probability from there.” And similar for agent 2. Then they exchange probabilities again, and go through an equivalent reasoning process to tighten the spaces a bit more… and the theorem ensures that they’d end up converging on the same probabilities? (each time they state unequal probabilities, they each learn more information and each one then comes up with a set that’s a strict subset of the one they were previously considering, but each of their sets always contain the intersection of I(w) and J(w))?
Try a concrete example: Two dice are thrown, and each agent learns one die’s value. In addition, each learns whether the other die is in the range 1-3 vs 4-6. Now what can we say about the sum of the dice?
Suppose player 1 sees a 2 and learns that player 2′s die is in 1-3. Then he knows that player 2 knows that player 1′s die is in 1-3. It is common knowledge that the sum is in 2-6.
You could graph it by drawing a 6x6 grid and circling the information partition of player 1 in one color, and player 2 in another color. You will find that the meet is a partition of 4 elements, each a 3x3 grid in one of the corners.
In general, anything which is common knowledge will limit the meet—that is, the meet partition the world is in will not extend to include world-states which contradict what is common knowledge. If 2 people disagree about global warming, it is probably common knowledge what the current CO2 level is and what the historical record of that level is. They agree on this data and each knows that the other agrees, etc.
The thrust of the theorem though is not what is common knowledge before, but what is common knowledge after. The claim is that it cannot be common knowledge that the two parties disagree.
What I don’t like about the example you provide is: what player 1 and player 2 know needs to be common knowledge. For instance if player 1 doesn’t know whether player 2 knows whether die 1 is in 1-3, then it may not be common knowledge at all that the sum is in 2-6, even if player 1 and player 2 are given the info you said they’re given.
This is what I was confused about in the grandparent comment: do we really need I and J to be common knowledge? It seems so to me. But that seems to be another assumption limiting the applicability of the result.
Not sure… what happens when the ranges are different sizes, or otherwise the type of information learnable by each player is different in non symmetric ways?
Anyways, thanks, upon another reading of your comment, I think I’m starting to get it a bit.
Different size ranges in Hal’s example? Nothing in particular happens. It’s ok for different random variables to have different ranges.
Otoh, if the players get different ranges about a single random variable, then they could have problems. Suppose there is one d6. Player A learns whether it is in 1-2, 3-4, or 5-6. Player B learns whether it is in 1-3 or 4-6.
And suppose the actual value is 1.
Then A knows it’s 1-2. So A knows B knows it’s 1-3. But A reasons that B reasons that if it were 3 then A would know it’s 3-4, so A knows B knows A knows it’s 1-4. But A reasons that B reasons that A reasons that if it were 4 then B would know it’s 4-6, so A knows B knows A knows B knows it’s 1-6. So there is no common knowledge, i.e. I∧J=Ω. (Omitting the argument w, since if this is true then it’s true for all w.)
And if it were a d12, with ranges still size 2 and 3, then the partitions line up at one point, so the meet stops at {1-6, 7-12}.
I’m not sure I understand how $\Omega$ represents the set of world histories. If world histories were to live anywhere, they’d live in the sigma algebra — as collections of events, per the definition. If not, and every element of $\Omega$ truly is a world history, then how can $F$ represent “information about which events occur in which possible world-histories”, when each $f \in F$ is made up of atoms from $\Omega$, that is, when every element in $F$ is a collection of world histories? One of these definitions ought to be recast, I believe. It might be most sensible to make $\Omega$ the set of all possible events across all possible histories, that way you can largely keep your other definitions as-is
Also, I am not sure the following claim is true: “which assigns a probability to each element of Ω, with the probabilities summing to 1”. It *is* true that every sigma algebra must contain $\Omega$, and typically $P(\Omega)=1$. But P acts on $F$, not $\Omega$, and of course not every atom in $\Omega$ must occur in $F$. Since you preface with the claim that you write this partly as an exercise to check understanding of the underlying ideas, I would kindly suggest considering a read-through of chapter 2 of Pollard’s excellent “User’s Guide to Measure Theoretic Probability”. It might clear up some of these matters
Interesting that the problems with Aumann’s theorem were pointed out ten years ago, but belief in it continues to be prevalent.
Diagrams would be wonderful, anyone up to drawing them?
I think that I understand this proof now. Does the following dialogue capture it?
AGENT 1: My observations establish that our world is in the world-set S. However, as far as I can tell, any world in S could be our world.
AGENT 2: My observations establish that our world is in the world-set T. However, as far as I can tell, any world in T could be our world.
TOGETHER: So now we both know that our world is in the world-set S ∩ T—though, as far as we can tell, any world in S ∩ T could be our world. Therefore, since we share the same priors, we both arrive at the same value when we compute P(E | S ∩ T), the probability that a given event E occurred in our world.
ETA: janos’s comment indicates that I’m missing something, but I don’t have the time this second to think it through. Sounds like the set that they ultimately condition on isn’t S ∩ T but rather a subset of it.
ETA2: Well, I couldn’t resist thinking about it, even though I couldn’t spare the time :). The upshot is that I don’t understand janos’s comment, and I agree with Psy-Kosh. As stated, for example, in this paper:
From this it follows that the element of I∧J containing w is precisely I(w) ∩ J(w). So, unless I’m missing something, my dialogue above completely captures the proof in the OP.
ETA3: It turns out that both possible ways of orienting the partial order relation are in common use. Everything that I’ve seen discussing the theory of set partitions puts refinements lower in the lattice. This was the convention that I was using above. But, as Vladimir Nesov points out, it’s natural to use the opposite convention when talking about epistemic agents, and this is the usage in Wei Dai’s post. The clash between these conventions was a large part of the cause of my confusion. At any rate, under the convention that Wei Dai is using, the element of I∧J containing w is not in general I(w) ∩ J(w).
Your dialog is one way to achieve agreement, and what I meant when I said “simply tell each other I(w) and J(w)” however it is not what Aumann’s proof is about. The dialog shows that two Bayesians with the same prior would always agree if they exchange enough information.
Aumann’s proof is not really about how to reach agreement, but why disagreements can’t be “common knowledge”. The proof follows a completely different structure from your dialog.
No, this is wrong. Please edit or delete it to avoid confusing others.
The implication that I asserted is correct. The confusion arises because both possible ways of orienting the partial order on partitions are common in the literature. But I’ll note that in the comment.
The problem is not in conventions and the literature, but in whether your interpretation captures the statement of the theorem discussed in the post. Ambiguity of the term is no excuse. By the way, “meet” is Aumann’s usage as well, as can be seen from the first page of the original paper.
Indeed. I plead guilty to reading hastily. I saw the term “meet” being used in a context where I already knew its definition (the only definition it had, so far as I knew), so I only briefly skimmed Wei Dai’s own definition. Obviously I was too careless.
However, it really bears emphasizing how strange it is to put refinements higher in the partial order of partitions, at least from the perspective of the general theory of partial orders. Under the category theoretic definition of partial orders, P ≤ Q means that there is a map P → Q. Now, to say that a partition Q is a coarsening of a partition P is to say that Q is a quotient P/~ of P. But such a quotient corresponds canonically to a map P → Q sending each element p of P to the equivalence class in Q containing p. Indeed, Wei Dai is invoking just such maps when he writes “I(w)”. In this case, Ω is construed as the discrete partition of itself (where each element is in its own equivalence class) and I is used (as an abuse of notation) for the canonical map of partitions I: Ω → I. The upshot is that one of these canonical partition maps P → Q exists if and only if Q is a coarsening of P. Therefore, that is what P ≤ Q should mean. In the context of the general theory of partial orders, coarser partitions should be greater than finer ones.
Efforts to illuminate Aumann’s disagreement result do seem rather rare—thanks for your efforts here.
seconded!
It appears to me that reducing this to an equation is totally irrelevant, in that it obscures the premises of the argument, and an argument is only as good as the reliability of the premises. Moreover, the theorem appears faulty based on inductive logic, in that the premises can be true and the conclusion false. I’m really interested in why this thought process is wrong.
While I see your point, I wouldn’t say that the agreement issue is over rated at all.
There are many disagreements that don’t change at all over arbitrarily many iterations, which sure don’t look right given AAT. Even if the beliefs don’t converge exactly, I don’t think its too much to ask for some motion towards convergence.
I think the more important parts are the parts that talk about predicting disagreements
Could robust statistics be relevant for explaining fixed points where disagreements do not change at all?
Roughly speaking, the idea of robust statistics is that the median or similar concepts may be preferable in some circumstances to the mean—and unlike the mean, the median routinely does not change at all, even when another datapoint changes.
I don’t think that really helps. If you’re treating someones beliefs as an outlier, then you’re not respecting that person as a rationalist.
Even if you did take the median of your metaprobability distribution (which is not the odds you want to bet on, though you may want to profess them for some reason), eventually you should change your mind (most bothersome disagreements involve people confidently on opposite sides of the spectrum so the direction in which to update is obvious).
It could be that in practice most people update beliefs according to some more “robust” method, but to the extent that it freezes their beliefs under new real evidence, its a sucky way of doing it and you don’t get a ‘get out of jail free’ card for doing it.
The main problem I have always had with this is that the reference set is “actual world history” when in fact that is the exact thing that observers are trying to decipher.
We all realize that there is in fact an “actual world history” however if it was known then this wouldn’t be an issue. Using it as a reference set then, seems spurious in all practicality.
I think that summation is a good way to interpret the problem I addressed in as practical a manner as is currently available; I would note however that most people arbitrarily weight observational inference, so there is a skewing of the data.
The sad part about the whole thing is that both or all observers exchanging information may be the same deviation away from w such that their combined probabilities of l(w)are further away from w than either individually.
Huh? The reference set Ω is the set of possible world histories, out of which one element is the actual world history. I don’t see what’s wrong with this.
I suppose my post was poorly worded. Yes, in this case omega is the reference set for possible world histories.
What I was referring to was the baseline of w as an accurate measure. It is a normalizing reference, though not a set.