Say More, Justify Less

Related to: Rational Me or We?

Human brains reach decisions using complicated, often opaque, mechanisms. Rational behavior requires learning about systematic failures of these mechanisms, but also requires learning to effectively use judgments for which our brains provide no communicable justification. If we cannot trust our brains just because we don’t understand them, our beliefs will be too slow to follow reality. This is a critically important lesson for individual rationality, but there is an even more important analog for collective rationality.

Aumann’s agreement theorem provides conditions under which rationalists cannot agree to disagree. This result is often cited to justify protracted arguments regarding contentious facts. These arguments generally proceed as an exchange of well-specified evidence and reasoned justification. Aumann’s agreement theorem says nothing about the effectiveness of such exchanges, and in my experience such arguments almost always end either quickly or unsuccessfully.

Why don’t rational arguments end well? Sometimes our beliefs depend on a few pieces of easily identified evidence, but often they depend on the balance of a large quantity of (potentially forgotten) evidence colored by our experience and aggregated using intuition. Sometimes our beliefs depend on simple easily articulated deductions, but often they depend on subtle inferences we don’t consciously understand. Arguments fail because the real sources of our beliefs cannot be easily communicated, or even well understood. If we rely only on reasoning and evidence that can be communicated in clean and conclusive arguments, our beliefs will be too slow to follow reality.

Why can’t we reach agreement by updating on the beliefs of others, as per Aumann’s theorem? Why do we need to describe how we arrived at our beliefs, instead of simply stating them? In fact in ordinary conversation no one even tries to use Aumann-style negotiation, and from my perspective this seems completely rational. Even when talking to people whose rationality I trust, assuming common knowledge of rationality or honesty is never close to accurate; if we abandon skepticism we can expect to be consistently wrong (even when we aren’t deliberately manipulated).

The real sources of our belief are hard to communicate and we aren’t sufficiently perfect rationalists to use Aumann agreement: if we want to reach rational consensus, we need to do some work.

In what cases should we be able to agree? I am not surprised, for example, that Robin Hanson and Eliezer Yudkowsky cannot reconcile their beliefs about the future of AIs / emulations: there is little available evidence regarding the reliability of anyone’s predictions about the distant future. But when an organization or society is repeatedly faced with questions about the near future it is possible to learn which individual’s beliefs are trustworthy, and to reach rational consensus, without relying on perfect rationality. I believe that this problem should be a priority for a rational organization.

First: How?

A. Commit to precise predictions.

Individuals should commit to predictions about everything which matters to the organization and is amenable to prediction. Estimate quantitative measures of the success of a program, of demand for a service, of future revenue, of future income from donations, of relevant political changes, etc.. Make predictions quickly and publicly, update on the predictions of others, and experiment with trusting different intuitions.

B. Track prediction quality.

Minimally, we would like the beliefs of an organization to be as good as the beliefs of its best member. Don’t just have members state predictions: measure their success as well. Use scoring rules to distribute trust to the most trustworthy agents. Record-keeping doesn’t need to be inefficient; make it fast, cultivate an atmosphere of keeping score and committing to errors, and build a record which can help individuals improve their own calibration and learn how they should be updating on the beliefs of others. Incentivize accurate prediction. Concentrate authority where it has been empirically effective, or where trusted group members predict it to be effective.

Require predictors to stake part of their reputation on predictions. The best scheme I know is Hanson’s market scoring rule, which rewards or punishes predictors based on their contribution to the collective probability estimate. When predictors need incentives, real or play money can be used to incentivize accuracy. When predictors are rationally interested in the success of the group, purely artificial reputation systems may be enough. In any case, disagreement is still possible. But if disagreements are frequent and pronounced and there is any difference in prediction quality, the difference will rapidly manifest itself in a difference of reputation (and if there is no difference, then by convexity an average of the two disagreeing predictors will outperform either and this fact will rapidly become common knowledge).

C. Chain prediction mechanisms to obtain faster feedback.

When an expensive prediction mechanism is available, use a cheaper prediction mechanism to predict the output of the expensive mechanism. This can provide more rapid feedback, and allow accurate predictions for much broader classes of events. For example:

If I want to estimate the success of an advertising campaign or interface design choice, implementing it and then measuring the success of the overall program is expensive. Carefully implemented focus groups or A/B tests can predict the ultimate success of a choice; once the predictive power of such tests is established, the results of those tests can be predicted instead of the eventual success of the project. This may provide much faster feedback. Continuing recursively, we may learn that certain individuals are good at predicting the outcome of a focus group or A/B test. Peers can then predict the prediction of these individuals. This gives even faster feedback, and can free up time for that individual. Since the attention of people with accurate beliefs and good planning skills is very important, this is an important consideration.

D. Substitute testable assertions.

Sometimes we are interested in vague or otherwise problematic assertions, such as “In a year, you’ll regret this decision.” When its important, choose related precise statements that are amenable to formal prediction. Some examples:

Choose a trusted observer and a future time, and bet about the judgment of that observer in hindsight. For example, rather than betting on whether “50% of humans will use social networks regularly in 10 years”, choose a knowledgable arbiter, offer them a moderate incentive to cooperate, and bet on whether they would agree with the statement “50% of humans use social networks regularly” in 10 years.

Imagine a possible outcome with precisely defined observable consequences, and make bets conditional on the realization of that outcome. For example, maybe it will only be clear whether or not the building is structurally sound if someone conducts a formal investigation. Then instead of betting on “The building foundation meets this standard”, bet on “If the building foundation is formally investigated, it will be found to meet this standard.”

Find other concrete markers—polls, surveys, price movements, financial outcomes, etc.--that corellate with the event in question and predict them instead.

E. Put a price on ideas.

The success of a proposed change—whether a new program, or a structural change—can be predicted as well as anything else. Such predictions can be used to estimate the value (in money, play money, reputation) of a new proposal and reward or penalize individuals for their contribution. This may incentivize members if that is needed, but more importantly it allows the organization to intelligently trade-off the time required to dismiss or test bad changes against the benefit of good changes. The judgment of whoever implements change can provide the feedback needed to train correct predictions about the value of a proposed change (see C above); allowing more organization members to engage in such predictions can conserve the time and attention of authority.

Once such predictions are accurate, ideas can be aggregated (as natural language proposals) automatically, and most proposals can be discarded without explicit discussion or significant time investment. Serious deliberation is only necessary for the ideas which are predicted to be most likely to be worth considering.

Second: Technical facts.

A. Proper scoring rules.

A scoring rule takes as arguments a predicted probability distribution and an actual outcome and outputs a score. A scoring rule is proper if honestly reporting your best estimate of the real probability distribution maximizes your expected score. A typical scoring rule with desirable properties is the logarithm of the probability assigned by the distribution to the actual outcome. A score of 0 is attained by exactly predicting the result. A score of log n is attained by the uniform distribution over n outcomes. Other scoring rules may be appropriate, depending on the consequences of an incorrect prediction.

B. Market scoring rules.

A scoring rule evaluates the quality of a single prediction. Robin Hanson has proposed market scoring rules to evaluate several individual’s contributions to a collective probability distribution. A market scoring rule works as follows: the collective probability distribution is initialized to a reasonable estimate, and any group member can modify the collective estimate at any time. When the prediction is resolved, each person who changed the probability distribution is rewarded or punished by the difference between the score associated to the probability distribution which they proposed and the score associated to the probability distribution which they replaced. Market scoring rules can be simulated with a prediction market with an automated market maker (an artificial market participant who always offers to buy/sell at a price determined by the history of trades).

C. Multiplicative Weights / Exp3

When consulting a group of experts (or choosing from among several possible strategies) a mathematically robust strategy is given by multiplicative weights (or Exp3): choose an advisor at random with probability proportional to a quantitative estimate of their reputation, and adjust the reputation of each expert by an amount proportional to the impact of that expert’s advice on your expected performance. This technique does about as well as the best expert, neglecting rare extremely bad or extremely good outcomes.

Third: Why Bother?

How important is rational agreement? Why can’t we use traditional organizational designs to make decisions in the absence of consensus, or arrive at consensus by more traditional mechanisms? My belief is that traditional structures do not result in rational behavior, in the same way that relying exclusively on human intuition does not result in rational behavior. Traditional organizations rely too extensively on the beliefs of upper management, they cannot make important decisions or implement change quickly because of a lack of elite attention, they fail to aggregate information about their own behavior, they fail to match the best beliefs of their constituents about important questions of fact, and they fail to encourage or select for rational behavior among their constituents. The sources my belief are complicated and hard to explain. Here are some words which might help others arrive at similar conclusions:

I do not consider the non-adaptation of modern organizations compelling evidence because I have little faith in their rational behavior (circularly). I do not consider the non-adaptation of managers compelling evidence because I have little faith in the ability of most individuals to enact significant change, especially in the face of considerable organizational and social inertia. I do not consider anything from the academic literature in this field relevant, because it appears to suffer from persistently irrational methodology and general uselessness. I believe that intelligence and rationality vary remarkably even among elites, and do not accept the weak efficient market hypothesis applied to entrepreneurial endeavors because it seems to be obviously false. I have excellent evidence for the collective irrationality of society as a whole, and of historical institutions. I believe that I have little similar evidence for modern corporations only because I know more about history (and the details of most modern corporations are opaque to me). From my limited evidence, modern corporations do also behave irrationality: they fail to even hold collective beliefs on critical questions, their decisions are determined by artifacts of their authority structure, and they remain successful despite the existence of significant corrections to their behavior (I know two or three case studies, none of which I should discuss publicly; I only have any evidence at all because of indirect connections to management). If I abandon typical arguments for the efficacy of modern corporations, the sum of evidence against it is compelling. I put considerable weight on the beliefs of Robin Hanson, who has said many true and uncommon things and really does seem to update on the beliefs of others.

But I think the best argument is: I am willing to stake a lot of money on the belief that rational organizations, in the sense I describe, can consistently out-compete traditional organizations. Is anyone interested in betting against me?