Say More, Justify Less
Related to: Rational Me or We?
Human brains reach decisions using complicated, often opaque, mechanisms. Rational behavior requires learning about systematic failures of these mechanisms, but also requires learning to effectively use judgments for which our brains provide no communicable justification. If we cannot trust our brains just because we don’t understand them, our beliefs will be too slow to follow reality. This is a critically important lesson for individual rationality, but there is an even more important analog for collective rationality.
Aumann’s agreement theorem provides conditions under which rationalists cannot agree to disagree. This result is often cited to justify protracted arguments regarding contentious facts. These arguments generally proceed as an exchange of well-specified evidence and reasoned justification. Aumann’s agreement theorem says nothing about the effectiveness of such exchanges, and in my experience such arguments almost always end either quickly or unsuccessfully.
Why don’t rational arguments end well? Sometimes our beliefs depend on a few pieces of easily identified evidence, but often they depend on the balance of a large quantity of (potentially forgotten) evidence colored by our experience and aggregated using intuition. Sometimes our beliefs depend on simple easily articulated deductions, but often they depend on subtle inferences we don’t consciously understand. Arguments fail because the real sources of our beliefs cannot be easily communicated, or even well understood. If we rely only on reasoning and evidence that can be communicated in clean and conclusive arguments, our beliefs will be too slow to follow reality.
Why can’t we reach agreement by updating on the beliefs of others, as per Aumann’s theorem? Why do we need to describe how we arrived at our beliefs, instead of simply stating them? In fact in ordinary conversation no one even tries to use Aumann-style negotiation, and from my perspective this seems completely rational. Even when talking to people whose rationality I trust, assuming common knowledge of rationality or honesty is never close to accurate; if we abandon skepticism we can expect to be consistently wrong (even when we aren’t deliberately manipulated).
The real sources of our belief are hard to communicate and we aren’t sufficiently perfect rationalists to use Aumann agreement: if we want to reach rational consensus, we need to do some work.
In what cases should we be able to agree? I am not surprised, for example, that Robin Hanson and Eliezer Yudkowsky cannot reconcile their beliefs about the future of AIs / emulations: there is little available evidence regarding the reliability of anyone’s predictions about the distant future. But when an organization or society is repeatedly faced with questions about the near future it is possible to learn which individual’s beliefs are trustworthy, and to reach rational consensus, without relying on perfect rationality. I believe that this problem should be a priority for a rational organization.
First: How?
A. Commit to precise predictions.
Individuals should commit to predictions about everything which matters to the organization and is amenable to prediction. Estimate quantitative measures of the success of a program, of demand for a service, of future revenue, of future income from donations, of relevant political changes, etc.. Make predictions quickly and publicly, update on the predictions of others, and experiment with trusting different intuitions.
B. Track prediction quality.
Minimally, we would like the beliefs of an organization to be as good as the beliefs of its best member. Don’t just have members state predictions: measure their success as well. Use scoring rules to distribute trust to the most trustworthy agents. Record-keeping doesn’t need to be inefficient; make it fast, cultivate an atmosphere of keeping score and committing to errors, and build a record which can help individuals improve their own calibration and learn how they should be updating on the beliefs of others. Incentivize accurate prediction. Concentrate authority where it has been empirically effective, or where trusted group members predict it to be effective.
Require predictors to stake part of their reputation on predictions. The best scheme I know is Hanson’s market scoring rule, which rewards or punishes predictors based on their contribution to the collective probability estimate. When predictors need incentives, real or play money can be used to incentivize accuracy. When predictors are rationally interested in the success of the group, purely artificial reputation systems may be enough. In any case, disagreement is still possible. But if disagreements are frequent and pronounced and there is any difference in prediction quality, the difference will rapidly manifest itself in a difference of reputation (and if there is no difference, then by convexity an average of the two disagreeing predictors will outperform either and this fact will rapidly become common knowledge).
C. Chain prediction mechanisms to obtain faster feedback.
When an expensive prediction mechanism is available, use a cheaper prediction mechanism to predict the output of the expensive mechanism. This can provide more rapid feedback, and allow accurate predictions for much broader classes of events. For example:
If I want to estimate the success of an advertising campaign or interface design choice, implementing it and then measuring the success of the overall program is expensive. Carefully implemented focus groups or A/B tests can predict the ultimate success of a choice; once the predictive power of such tests is established, the results of those tests can be predicted instead of the eventual success of the project. This may provide much faster feedback. Continuing recursively, we may learn that certain individuals are good at predicting the outcome of a focus group or A/B test. Peers can then predict the prediction of these individuals. This gives even faster feedback, and can free up time for that individual. Since the attention of people with accurate beliefs and good planning skills is very important, this is an important consideration.
D. Substitute testable assertions.
Sometimes we are interested in vague or otherwise problematic assertions, such as “In a year, you’ll regret this decision.” When its important, choose related precise statements that are amenable to formal prediction. Some examples:
Choose a trusted observer and a future time, and bet about the judgment of that observer in hindsight. For example, rather than betting on whether “50% of humans will use social networks regularly in 10 years”, choose a knowledgable arbiter, offer them a moderate incentive to cooperate, and bet on whether they would agree with the statement “50% of humans use social networks regularly” in 10 years.
Imagine a possible outcome with precisely defined observable consequences, and make bets conditional on the realization of that outcome. For example, maybe it will only be clear whether or not the building is structurally sound if someone conducts a formal investigation. Then instead of betting on “The building foundation meets this standard”, bet on “If the building foundation is formally investigated, it will be found to meet this standard.”
Find other concrete markers—polls, surveys, price movements, financial outcomes, etc.--that corellate with the event in question and predict them instead.
E. Put a price on ideas.
The success of a proposed change—whether a new program, or a structural change—can be predicted as well as anything else. Such predictions can be used to estimate the value (in money, play money, reputation) of a new proposal and reward or penalize individuals for their contribution. This may incentivize members if that is needed, but more importantly it allows the organization to intelligently trade-off the time required to dismiss or test bad changes against the benefit of good changes. The judgment of whoever implements change can provide the feedback needed to train correct predictions about the value of a proposed change (see C above); allowing more organization members to engage in such predictions can conserve the time and attention of authority.
Once such predictions are accurate, ideas can be aggregated (as natural language proposals) automatically, and most proposals can be discarded without explicit discussion or significant time investment. Serious deliberation is only necessary for the ideas which are predicted to be most likely to be worth considering.
Second: Technical facts.
A. Proper scoring rules.
A scoring rule takes as arguments a predicted probability distribution and an actual outcome and outputs a score. A scoring rule is proper if honestly reporting your best estimate of the real probability distribution maximizes your expected score. A typical scoring rule with desirable properties is the logarithm of the probability assigned by the distribution to the actual outcome. A score of 0 is attained by exactly predicting the result. A score of log n is attained by the uniform distribution over n outcomes. Other scoring rules may be appropriate, depending on the consequences of an incorrect prediction.
B. Market scoring rules.
A scoring rule evaluates the quality of a single prediction. Robin Hanson has proposed market scoring rules to evaluate several individual’s contributions to a collective probability distribution. A market scoring rule works as follows: the collective probability distribution is initialized to a reasonable estimate, and any group member can modify the collective estimate at any time. When the prediction is resolved, each person who changed the probability distribution is rewarded or punished by the difference between the score associated to the probability distribution which they proposed and the score associated to the probability distribution which they replaced. Market scoring rules can be simulated with a prediction market with an automated market maker (an artificial market participant who always offers to buy/sell at a price determined by the history of trades).
C. Multiplicative Weights / Exp3
When consulting a group of experts (or choosing from among several possible strategies) a mathematically robust strategy is given by multiplicative weights (or Exp3): choose an advisor at random with probability proportional to a quantitative estimate of their reputation, and adjust the reputation of each expert by an amount proportional to the impact of that expert’s advice on your expected performance. This technique does about as well as the best expert, neglecting rare extremely bad or extremely good outcomes.
Third: Why Bother?
How important is rational agreement? Why can’t we use traditional organizational designs to make decisions in the absence of consensus, or arrive at consensus by more traditional mechanisms? My belief is that traditional structures do not result in rational behavior, in the same way that relying exclusively on human intuition does not result in rational behavior. Traditional organizations rely too extensively on the beliefs of upper management, they cannot make important decisions or implement change quickly because of a lack of elite attention, they fail to aggregate information about their own behavior, they fail to match the best beliefs of their constituents about important questions of fact, and they fail to encourage or select for rational behavior among their constituents. The sources my belief are complicated and hard to explain. Here are some words which might help others arrive at similar conclusions:
I do not consider the non-adaptation of modern organizations compelling evidence because I have little faith in their rational behavior (circularly). I do not consider the non-adaptation of managers compelling evidence because I have little faith in the ability of most individuals to enact significant change, especially in the face of considerable organizational and social inertia. I do not consider anything from the academic literature in this field relevant, because it appears to suffer from persistently irrational methodology and general uselessness. I believe that intelligence and rationality vary remarkably even among elites, and do not accept the weak efficient market hypothesis applied to entrepreneurial endeavors because it seems to be obviously false. I have excellent evidence for the collective irrationality of society as a whole, and of historical institutions. I believe that I have little similar evidence for modern corporations only because I know more about history (and the details of most modern corporations are opaque to me). From my limited evidence, modern corporations do also behave irrationality: they fail to even hold collective beliefs on critical questions, their decisions are determined by artifacts of their authority structure, and they remain successful despite the existence of significant corrections to their behavior (I know two or three case studies, none of which I should discuss publicly; I only have any evidence at all because of indirect connections to management). If I abandon typical arguments for the efficacy of modern corporations, the sum of evidence against it is compelling. I put considerable weight on the beliefs of Robin Hanson, who has said many true and uncommon things and really does seem to update on the beliefs of others.
But I think the best argument is: I am willing to stake a lot of money on the belief that rational organizations, in the sense I describe, can consistently out-compete traditional organizations. Is anyone interested in betting against me?
You should cite existing prediction-judging websites—e.g. those listed on Wikipedia.
I suspect that with each link in the chain, you get a much less accurate and stable result.
Yes, but if the cost reduction and speed improvement is enough it can still be worthwhile. “A good answer now is better than a perfect answer next week” and so forth. It doesn’t take a doctorate in physics to know that an initially stationary, unsupported rock less than ten meters above the earth’s surface will hit that surface within the next second or two.
For that matter, consider some sort of craftsmanship where masters traditionally employ apprentices. The apprentice can’t do master-level work (yet) but is still in training, and can improve the master’s performance by performing necessary menial tasks and allowing the boss to concentrate on things that demand mastery.
It is interesting that you place such strong emphasis on tracking prediction quality, while other people who have looked at this question have emphasized the importance of anonymity. I might suggest that your proposals ignore the fact that the individual participants are not perfect Bayesians, but rather are imperfect human beings with biases and arational emotions—like pride, obstinacy, etc.
It seems that we can structure our collective decision-making institutions either so as to mask common human weaknesses or to exploit the power of ideal individual rationalism. Or perhaps we can strike a balance and do both.
The corporate ‘self-help’ literature is almost as voluminous as the personal genre. Quite a bit of it clusters around ideas related to CMM and CMMI
I think if your obstinacy consistently causes you to do worse than a less obstinate imitator and you actually see the effect, you may be able to get over it. How rational do you have to be for this to work? I strongly suspect I am well above the cutoff, and many LW readers are as well. I would be interested to learn that I am wrong.
If the implications in this literature are actually representative of the behavior of dominant corporations, then my confidence (in the ability of rationalists to do much better) is increased significantly. My uncertainty mostly comes from the (I think likely) possibility that there is a significant disconnect between this literature and the actual behavior of corporations.
Why do you expect this to happen?
I think you’re saying that if some masterful predictors exist, then their casting a prediction will mostly determine the outcome, sufficient for lesser people to (with a little uncertainty) learn whether they predicted well.
I suspect that for any given event, some people are better at predicting it than others. For example, in the case of UI design it is definitely the case that some people are much better than others at predicting what will do well in an A/B test. If such people are scarce, learning to predict their predictions is a good substitute for learning to predict the results of A/B tests themselves. (The real question in this situation is whether the overhead of prediction could be low enough for this to matter.)
An important point
In regards to checking the quality of predictions—this may be harder than it sounds because it requires understanding underlying conditions.
I’ve heard a plausible theory that investors can look better than they are—for a while—if their temperament or theories happen to match the way the market is going.
A poker player using optimal play can still have a long run of bad luck.
How long are we talking exactly? Long runs of bad (or good) luck are rather improbable!
Two and a half years. I don’t know that he’s using optimal strategy, but I’m assuming that he’s at least competent.
Long runs of bad luck are improbable, but there are a lot of poker players and they’re playing a lot of games.
I’ve seen enough mentions of runs of bad luck which last for months that I felt it was reasonable to make my previous comment. On the other hand, googling turns up relatively little on the subject, so I’m lowering my estimate of the probability.
Whenever anyone talks about a run of bad luck lasting that long, I assume that they’ve shifted their threshold such that ordinarily luck is no longer good enough.
It’s also possible that they aren’t playing as well.
It seems as though someone should have done the math on the likelihood of very long stretches of bad luck in poker, but a casual search didn’t turn anything up.
Tell me more about how the bet would be set up, and I might be interested.
I mean, you might try to set up a rational business firm to dislodge the incumbents in some existing industry. But the failure of such an upstart wouldn’t prove that rational firms are not ceteris paribus better, only that their advantages are not greater than the advantages of being an incumbent.
The best way to stake that money would surely be to found such a rational organisation yourself.
I agree. But though I would be willing to stake money on the question now, it is probably worth it to spend some time improving my estimate first.
This is an interesting problem. I agree that Aumann-style updating in practice will lead to less correct beliefs systematically, but you don’t seem to analyse why it is so. My hunch is that the reason is our inability to judge other people’s rationality well, but I’d like to know what do you think about it. The explanation
doesn’t tell much.
Aumann agreement requires knowing one’s priors and posteriors. Actually knowing, i.e. being able to state the actual numbers. But no-one can do this.
Noone can do this exactly, but why isn’t some approximation effective? To update in a Bayesian way we need to know our priors too, and not being able to state the numbers precisely isn’t seen as a reason for not using Bayesian updating in a wide class of practical situations.
I would guess the problem isn’t approximation, its the common knowledge no one has. You can’t approximate this, and if anyone suspects it may not be completely true (which they should, since it isn’t) the result completely falls apart.
I forgot the other requirement, and the more onerous one, for Aumann agreement: the two people’s priors must already agree. This is absolutely unrealistic.
Strong Bayesians may say that there is a unique universal prior that every perfect Bayesian reasoner must have, but until that prior can be exhibited, Aumann agreement must remain a mirage. No-one has exhibited that prior.
Perfectly updating on the basis of others’ opinions is rare; I have never seen anyone who purports to do it correctly (I can’t update correctly on the basis of almost anything). Common knowledge of this ability seems impossible to come by. Common knowledge that neither of you is trying to manipulate the other (or speaking for signaling reasons) is also completely non-existent: I have never been in an environment where more than 2 or 3 levels of non-manipulation were known.
Aumann agreement requires common knowledge of priors. Since we don’t even know our own priors, Aumann agreement is not possible.
While I believe that prediction is a (very useful) rationalist skill, it’s not the filter I would use to evaluate rationality. I suppose I think the skill is too difficult by several levels, and then too specialized. I don’t expect someone in the top .05% to know better what is going to happen, but I expect them to know better what to do when it happens.
… this is just based on a ‘large quantity of forgotten evidence colored by our experience and aggregated using intuition’ but then I consider some concrete examples to see if they fit...
.. I don’t expect a rational president to predict what will happen in Libya, but I expect him to be have a good idea of which political theories to apply to the situation to make the best outcome most probable. It seems to be a different, weird kind of intelligence to anticipate that, say, one personality will form an alliance with another personality and they will cause event X that determines the outcome.
.. I had a friend in college that was very smart and she’s a vet now; I expect her to be able to figure out whatever is wrong with whatever animal that comes to her clinic, including problems she hasn’t seen before, but I don’t expect she’d be able to predict much about anything she hasn’t seen before. She’s be able to make some educated guesses based on what she knows, but, again, something she doesn’t anticipate could easily eclipse her expectation.
I am not suggesting that prediction is the filter. Predictions of a certain kind (will policy change X be good or bad for the organization? Should we use our funds in this way or that way?) are a necessary part of doing business. If you want to know what is wrong with an animal who has come into a clinic, you want a prediction (either about the results of examinations, or about responses to treatment). Somehow an organization needs to make that prediction: it can give the authority to one person, it can let several people argue about it, etc.
Wholly agree. Further, even recognized good counter-arguments may not sway immediately if the idea being replaced has been integrated into a larger networked structure of (possibly accurate) beliefs. It takes time to tangle out a new network that is free of the false idea. I find that it usually takes an attack from several different directions to let go of a wrong idea that is well-integrated.
I’ve noticed this too. People simply state lots of beliefs and people are just expected to nod if they agree or disagree. If they’re not sure whether they agree or disagree, then they can ask questions along the lines of, ‘why do you think that?’. So that the Aumann process only proceeds in marginal cases, when someone is likely to be swayed. This seems efficient.