Don’t Double-Crux With Suicide Rock
Honest rational agents should never agree to disagree.
This idea is formalized in Aumann’s agreement theorem and its various extensions (we can’t foresee to disagree, uncommon priors require origin disputes, complexity bounds, &c.), but even without the sophisticated mathematics, a basic intuition should be clear: there’s only one reality. Beliefs are for mapping reality, so if we’re asking the same question and we’re doing everything right, we should get the same answer. Crucially, even if we haven’t seen the same evidence, the very fact that you believe something is itself evidence that I should take into account—and you should think the same way about my beliefs.
In “The Coin Guessing Game”, Hal Finney gives a toy model illustrating what the process of convergence looks like in the context of a simple game about inferring the result of a coinflip. A coin is flipped, and two players get a “hint” about the result (Heads or Tails) along with an associated hint “quality” uniformly distributed between 0 and 1. Hints of quality 1 always match the actual result; hints of quality 0 are useless and might as well be another coinflip. Several “rounds” commence where players simultaneously reveal their current guess of the coinflip, incorporating both their own hint and its quality, and what they can infer about the other player’s hint quality from their behavior in previous rounds. Eventually, agreement is reached. The process is somewhat alien from a human perspective (when’s the last time you and an interlocutor switched sides in a debate multiple times before eventually agreeing?!), but not completely so: if someone whose rationality you trusted seemed visibly unmoved by your strongest arguments, you would infer that they had strong evidence or counterarguments of their own, even if there was some reason they couldn’t tell you what they knew.
Honest rational agents should never agree to disagree.
In “Disagree With Suicide Rock”, Robin Hanson discusses a scenario where disagreement seems clearly justified: if you encounter a rock with words painted on it claiming that you, personally, should commit suicide according to your own values, you should feel comfortable disagreeing with the words on the rock without fear of being in violation of the Aumann theorem. The rock is probably just a rock. The words are information from whoever painted them, and maybe that person did somehow know something about whether future observers of the rock should commit suicide, but the rock itself doesn’t implement the dynamic of responding to new evidence.
In particular, if you find yourself playing Finney’s coin guessing game against a rock with the letter “H” painted on it, you should just go with your own hint: it would be incorrect to reason, “Wow, the rock is still saying Heads, even after observing my belief in several previous rounds; its hint quality must have been very high.”
Honest rational agents should never agree to disagree.
Human so-called “rationalists” who are aware of this may implicitly or explicitly seek agreement with their peers. If someone whose rationality you trusted seemed visibly unmoved by your strongest arguments, you might think, “Hm, we still don’t agree; I should update towards their position …”
But another possibility is that your trust has been misplaced. Humans suffering from “algorithmic bad faith” are on a continuum with Suicide Rock. What matters is the counterfactual dependence of their beliefs on states of the world, not whether they know all the right keywords (“crux” and “charitable” seem to be popular these days), nor whether they can perform the behavior of “making arguments”—and definitely not their subjective conscious verbal narratives.
And if the so-called “rationalists” around you suffer from correlated algorithmic bad faith—if you find yourself living in a world of painted rocks—then it may come to pass that protecting the sanctity of your map requires you to master the technique of lonely dissent.
- “Rationalist Discourse” Is Like “Physicist Motors” by 26 Feb 2023 5:58 UTC; 136 points) (
- Philosophy in the Darkest Timeline: Basics of the Evolution of Meaning by 7 Jun 2020 7:52 UTC; 132 points) (
- “Justice, Cherryl.” by 23 Jul 2023 16:16 UTC; 85 points) (
- If Clarity Seems Like Death to Them by 30 Dec 2023 17:40 UTC; 45 points) (
- Blanchard’s Dangerous Idea and the Plight of the Lucid Crossdreamer by 8 Jul 2023 18:03 UTC; 38 points) (
- Why the Problem of the Criterion Matters by 30 Oct 2021 20:44 UTC; 24 points) (
- 21 Apr 2024 14:47 UTC; 4 points) 's comment on “Justice, Cherryl.” by (
- 15 Jul 2020 6:01 UTC; 0 points) 's comment on Algorithmic Intent: A Hansonian Generalized Anti-Zombie Principle by (
Presumably double crux with Suicide Rock would reveal that the Rock doesn’t have any cruxes, and double crux with someone suffering from algorithmic bad faith would also reveal that, tho perhaps more subtly?
You are a bit too quick to allow the reader the presumption that they have more algorithmic faith than the other folks they talk to. Yes if you are super rational and they are not, you can ignore them. But how did you come to be confident in that description of the situation?
Everything I’m saying is definitely symmetric across persons, even if, as an author, I prefer to phrase it in the second person. (A previous post included a clarifying parenthetical to this effect at the end, but this one did not.)
That is, if someone who trusted your rationality noticed that you seemed visibly unmoved by their strongest arguments, they might think that the lack of agreement implies that they should update towards your position, but another possibility is that their trust has been misplaced! If they find themselves living a world of painted rocks where you are one of the rocks, then it may come to pass that protecting the sanctity of their map would require them to master the technique of lonely dissent.
You could argue that my author’s artistic preference to phrase things in the second person is misleading, but I’m not sure what to do about that while still accomplishing everything else I’m trying to do with my writing: my reply to Wei Dai and a Reddit user’s commentary on another previous post seem relevant.
Being able to parse philosophical arguments is evidence of being rational. When you make philosophical arguments, you should think of yourself as only conveying content to those who are rationally parsing things, and conveying only appearance/gloss/style to those who aren’t rationally parsing things.
Uh, we are talking about holding people to MUCH higher rationality standards than the ability to parse Phil arguments.
I think being smart is only very small evidence for being rational (especially globally rational, as Zach is assuming here, rather than locally rational).
I think most of the evidence towards being rational of understanding philosophical evidence is screened off by being smart (which again, is a very very weak correlation already).
In practical terms, agreeing to disagree can simply mean that given resource constraints it isn’t worth reaching convergence on this topic given the delta in expected payoffs.
I frequently find myself in situations where:
1) I disagree with someone
2) My opinion is based on fairly large body of understanding accumulated over many years
3) I think I understand where the other person is going wrong
4) trying to reach convergence would, in practice, look like a pointless argument that would only piss everyone off.
If there are real consequences at stake, I’ll speak up. Often I’ll have to take it offline and write a few pages, because some positions too complex for most people to follow orally. But if the agreement isn’t worth the argument, I probably won’t.
And if the problem formulation is much simpler than the solution then there will be a recurring explanatory debt to be paid down as multitudes of idiots re-encounter the problem and ignore existing solutions.
This is what FAQs are for. On LW, The Sequences are our FAQ.
I think this is an important consideration of bounded rational agents, and much more so for embedded agents, which is unfortunately often ignored. The result is that you should not expect to ever meet an agent where Aumann fully applies in all cases because neither of you has the computational resources necessary to always reach agreement.
Honest rational agents can still disagree if the fact that they’re all honest and rational isn’t common knowledge.
Yep. “Mutually trusting” would be better than “honest”.
Go one step further.
There are no such agents. On many topics, NOBODY, including you and including me, is sufficiently honest NOR sufficiently rational for Aumann’s theorem to apply.
The other problem with Aumann’s agreement theorem is that it’s often applied too broadly. It should say, “Honest rational agents should never agree to disagree on matters of fact.” What to do about those facts is definitely up for disagreement, insofar as two honest, rational agents may value wildly different things.
An earlier draft actually specified ”… on questions of fact”, but I deleted that phrase because I didn’t think it was making the exposition stronger. (Omit needless words!) People who understand the fact/value distinction, instrumental goals, &c. usually don’t have trouble “relativizing” policy beliefs. (Even if I don’t want to maximize paperclips, I can still have a lawful discussion about what the paperclip-maximizing thing to do would be.)
I understand the point about omitting needless words, but I think the words are needed in this case. I think there’s a danger here of Aumann’s agreement theorem being misused to prolong disagreements when those disagreements are on matters of values and future actions rather than on the present state of the world. This is especially true in “hot” topics (like politics, religion, etc) where matters of fact and matters of value are closely intertwined.
A slightly different frame on this (I think less pessimistic) is something like “honesty hasn’t been invented yet”. Or, rather, explicit knowledge of how to implement honesty does not exist in a way that can be easily transferred. (Tacit knowledge of such may exist but it’s hard to validate and share)
(I’m referring, I think, to the same sort of honesty Zack is getting at here, although the aspects of it that are relevant to doublecrux that didn’t come up in that previous blogpost)
I think, obviously, that there have been massive strides (across human history, and yes on LW in particular) in how to implement “Idealized Honesty” (for lack of a better term for now). So, the problem seems pretty tractable. But it does not feel like a thing within spitting distance.
The kind of honesty Zack is talking about is desirable, but it’s unclear whether it’s sufficient for Aumann’s theorem to apply.
The conditions [1] are sufficient for the conclusion [2] (as shown by [4]) but are not all necessary.
Honesty is not required.
If this is surprising, then it might be useful to consider that ‘having common priors’ is kind of like being able to read people’s minds—what they are thinking will be within the space of possibilities you consider. Things such rational agents say to each other may be surprising, but never un-conceived of; never inconceivable. And with each (new) piece of information they acquire they come closer to the truth—whether the words they hear are “true” or “false”, it matters not—only what evidence ‘hearing those words’ is. Under such circumstances lies may be useless. Not because rational agents are incapable of lying, but because, they possess impossible computational abilities that ensure convergence of shared beliefs* (in their minds) after they meet—a state of affairs which does not tell you anything about their words.
Events may proceed in a fashion such that a third observer (that isn’t a “rational agent”) such as you or I, might say “they agreed to disagree”. Aumann’s agreement theorem doesn’t tell us that this will never happen, only that such an observer would be wrong about what actually happened to their (internal) beliefs, however their professed (or performed beliefs) hold otherwise.
One consequence of this is how such a conversation might go—the rational agents might simply state the probabilities they give for a proposition, rather than discussing the evidence, because they can assess the evidence from each other’s responses, because they already know all the evidence that ‘might be’.
*Which need not be at or between where they started. Two Bayesian with different evidence that has led them to believe something is very unlikely, after meeting may conclude that it is very likely—if that is the assessment they would give had they had both pieces of information to begin with.
If an agent is not honest, ey can decide to say only things that provide no evidence regarding the question in hand to the other agent. In this case convergence is not guaranteed. For example, Alice assigns probability 35% to “will it rain tomorrow” but, when asked, says the probability is 21% regardless of what the actual evidence is. Bob assigns probability 89% to “will it rain tomorrow” but, when asked, says the probability is 42% regardless of what the actual evidence is. Alice knows Bob always answers 42%. Bob knows Alice always answers 21%. If they talk to each other, their probabilities will not converge (they won’t change at all).
Yes, it can luckily happen that the lies still contain enough information for them to converge, but I’m not sure why do you seem to think it is an important or natural situation?
I don’t think the ‘rational agents’ in question are a good model for people, or that the theoretical situation is anything close to natural. Aside from the myriad ways they are different*, the result of ‘rational people’** interacting seems like an empirical question. Perhaps a theory that models people better will come up with the same results—and offer suggestions for how people can improve.
The addition of the word “honest” seems like it comes from an awareness of how the model is flawed. I pointed out how this differs from the model, because the model is somewhat unintuitive, and makes rather large assumptions—and it’s not clear how well the result holds up as the gap between those assumptions and reality are removed.
Yes, to ask for a theory that enables constructing, or approximating, the agents described therein would be asking for a lot, but that might clearly establish how the theory relates to reality/people interacting with each other.
*like having the ability to compute uncomputable things instantly (with no mistakes),
**Who are computationally bounded, etc.
The addition of the word “honest” doesn’t come from an awareness of how the model is flawed. It is one of the explicit assumptions in the model. So, I’m still not sure what point are you going for here.
I think that applying Aumann’s theorem to people is mostly interesting in the prescriptive rather than descriptive sense. That is, the theorem tells us that our ability to converge can serve as a test of our rationality, to the extent that we are honest and share the same prior, and all of this is common knowledge. (This last assumption might be the hardest to make sense of. Hanson tried to justify it but IMO not quite convincingly.) Btw, you don’t need to compute uncomputable things, much less instantly. Scott Aaronson derived a version of the theorem with explicit computational complexity and query complexity bounds that don’t seem prohibitive.
Given all the difficulties, I am not sure how to apply it in the real world and whether that’s even possible. I do think it’s interesting to think about it. But, to the extent it is possible, it definitely requires honesty.
Important note about Aumann’s agreement theorem: both agents have to have the same priors. With human beings this isn’t always the case, especially when it comes to values. But even with perfect Bayesian reasoners it isn’t always the case, since their model of the world is their prior. Two Bayesians with the same data can disagree if they are reasoning from different causal models.
Now with infinite data, abandonment of poor performing models, and an Occam prior it is much more likely that they will agree. But not mathematically guaranteed AFAIK.
It’s a good heuristic in practice. But don’t draw strong conclusions from it without corroborating evidence.
The usual formalization of “Occam’s prior” is the Solomonoff prior, which still depends on the choice of a Universal Turing Machine, so such agents can still disagree because of different priors.
I just want to spring off of this to point out something about Aumann’s agreement theorem. I often see it used as a kind of cudgel because people miss an important aspect.
It can take us human beings time and effort to converge on a view.
Oftentimes it’s just not worth it to one or more of the participants to invest that time and effort.
But there’s no agreement about what constitutes evidence of something being real, so even agreement about fact is going to be extremely difficult.
“Honest rational agents should never agree to disagree.”
I never really looked into Aumann’s theorem. But can one not envisage a situation where they “agree to disagree”, because the alternative is to argue indefinitely?
The title of Aumann’s paper is just a pithy slogan. What the slogan means as the title of his paper is the actual mathematical result that he proves. This is that if two agents have the same priors, but have made different observations, then if they share only their posteriors, and each properly updates on the other’s posterior, and repeat, then they will approach agreement without ever having to share the observations themselves. In other papers there are theorems placing practical bounds on the number of iterations required.
In actual human interaction, there is a large number of ways in which disagreements among us may fall outside the scope of this theorem. Inaccuracy of observation. All the imperfections of rationality that may lead us to process observations incorrectly. Non-common priors. Inability to articulate numerical priors. Inability to articulate our observations in numerical terms. The effort required may exceed our need for a resolution. Lack of good faith. Lack of common knowledge of our good faith.
Notice that these are all imperfections. The mathematical ideal remains. How to act in accordance with the eternal truths of mathematical theorems when we lack the means to satisfy their hypotheses is the theme of a large part of the Sequences.
No, the alternative (and only outcome for honest rational agents) is to converge to one belief. Each takes the other’s stated (and mutually known to be honest and rational) beliefs as evidence, on which they update their own.