Knightian Uncertainty from a Bayesian perspective

Some people have maintained that there are events to which there’s no rational basis for assigning probabilities. For example, John Maynard Keynes wrote of “uncertainty” in the following sense:

“By `uncertain’ knowledge, let me explain, I do not mean merely to distinguish what is known for certain from what is only probable. The game of roulette is not subject, in this sense, to uncertainty...The sense in which I am using the term is that in which the prospect of a European war is uncertain, or the price of copper and the rate of interest twenty years hence...About these matters there is no scientific basis on which to form any calculable probability whatever. We simply do not know.” (J.M. Keynes, 1937)

This sort of uncertainty is sometimes referred to as Knightian uncertainty.

MIRI is interested in making probabilistic predictions about events such as the creation of general artificial intelligence, which are without precedent, and which therefore cannot be assigned probabilities via frequentist means. Some of these events are presumably of the type that Keynes had in mind. At MIRI’s request, I did a literature review looking for arguments against there being a rational basis for assigning probabilities to such events.

Definitions of subjective probability

One can attempt to define the subjective probability that an agent assigns to an event to be, intuitively, the number that it would assign if it were to make a very large number of predictions with a view toward, for each x, assigning probability x% to a collection of events of which x% actually occur. Eliezer discusses the mathematical formalism behind this in A Technical Explanation of a Technical Explanation.

Other definitions of subjective probabilities have been given by Ramsey (1931), de Finetti (1937), Koopman (1940), Good (1950), Savage (1954), Davidson and Suppes (1956), Kraft, Pratt and Seidenberg (1959), Anscombe and Aumann (1963) and Wakker (1989). (Fishburn (1986) gives a survey of the literature.) I have not studied the mathematical formalisms of most of these papers, but here’s a definition inspired by them (one which is immune to some of the criticisms that have been raised against some of the definitions).

Assume that for each number p between 0 and 1, there is a random process R that yields an outcome O’ with “objective” probability p. Here “objective” probability refers to a probability that can be determined via physics or frequentist means. Your subjective probability of an event E is defined as follows. Suppose that you have an event F, that you strongly desire to happen, and a choice between the following options:

  1. F occurs if and only if E occurs.

  2. F occurs if and only if the outcome of R is O’

Consider the set S of values of p such that you’d prefer #2 over #1. Then your subjective probability q of E is defined to be the greatest lower bound of S.

(F is usually taken to be a monetary reward arising from a bet.)

For example, suppose that E and F are the both the event “humanity survives for millions of years” and you have the opportunity to push a button that will guarantee this with probability p and otherwise guarantee that this does not happen. If you’re willing to push it when p = 99.999%, that means that you assign a probability less than 99.999% to humanity surviving for millions of years. If you’re not willing to push it when p = 0.001%, that means that you assign a probability greater than 0.001% to humanity surviving for millions of years.

Some objections to the definition are:

  • Your value of q is sensitive to factors such as framing effects, mood and what evidence you happen to have considered most recently considered. Kyburg (1968) discusses this on pages 57-58, and Shafer (1986) discusses this on page 465. So q may not be well defined as a number.

  • It assumes that you’ve considered the question in the definition. If an agent has never considered an event E, it doesn’t have a probability attached to it stored in its memory, even implicitly. And even if one has considered the event E, one may not have had occasion to make an assessment, because of the absence of an event F for which #1 as in the definition could plausibly hold.

These two objections also apply to the definition that Eliezer discusses in A Technical Explanation of a Technical Explanation.

Addressing these points in turn:

  • q may still be well defined as an interval: in the example above involving humanity surviving for millions of years, it could be that the value of q that you assign fluctuates wildly between 10% and 90% depending on when you’re asked and how you’re asked, but that it always remains between 0.001% and 99.999%. Keynes discussed this in A Treatise on Probability, Kyburg suggests this in his 1968 paper, and Niklas Moller cites Ellsberg (1961), Kaplan (1983) and Levi (1986) on page 66 of Handbook of Risk Theory.

  • One can make the agent aware of the possibility of the event E, and try to create such a suitable event F. This may not be feasible, for example, because one lacks the resources to create such an event F, or because E is in the far future. But if one wishes to assign a probability to an event E, one can imagine an associated event F, and imagine that one was making the choice between #1 and #2.

Pragmatic objections to assigning subjective probabilities

Even if subjective probabilities are well-defined (up to the two issues mentioned above), assigning a subjective probability in a given instance could be bad for one’s epistemology. Some proponents of the idea of Knightian uncertainty may implicitly adhere to this position. Some ways in which assigning a subjective probability can lead one astray are given below.

Overconfidence in models

Suppose that one has a model of the world that one thinks is probably right and according to which the probability of an event E is extremely small. If one forgets that the model might be wrong, one might erroneously conclude that the probability of E occurring is extremely small. (Yvain discussed this in Confidence levels inside and outside an argument.)

This appears to be close to Keynes’ objection to assigning subjective probabilities. I have not studied Keynes’ original work, but several people who have written about him seem to implicitly ascribe this position to him. For example, in a book review discussing Keynes, John Gray wrote:

Even our list of possible outcomes may turn out to have omitted the ones that are most important in shaping events. Such an omission was one of the factors that led Long-Term Capital Management, a highly leveraged hedge fund set up by two Nobel Prize winning economists, to fail in 1998-2000. The information used in applying the formula did not include the possibility of such events as the Asian financial crisis and Russia’s default on its sovereign debt, which destabilised global financial markets and helped destroy the fund. The orthodoxy that came unstuck with the collapse of LTCM was not faulty because it neglected the vagaries of human moods; its mistake was to think that the unknown future could be turned into a set of calculable risks and, in effect, conjured out of existence, which was impossible. Several centuries earlier, Pascal – one of the founders of probability theory – had come to the same conclusion, when in the Pensées he asks ironically: ‘Is it probable that probability brings certainty?’ The central flaw of the economic orthodoxy against which Keynes fought in the 1930s was to imagine that an insoluble problem – human ignorance of the future – had been solved. The error was repeated in the 1990s, when economists came to believe that complex mathematical formulae could tame uncertainty in the murky world of derivatives.

One can assign a probability to one’s model of the world being accurate, to account for model uncertainty. Keynes’ position is perhaps best interpreted as a statement about effect size: a claim that the probability that one should assign to one’s model being inaccurate is large.

Insensitivity to robustness of evidence

Kyburg (1968) argues that probabilities don’t adequately pick up on robustness of evidence. He gives the example of drawing balls from an urn with black and white balls of unknown relative frequencies. He says that there’s a big difference between

  1. An initial guess that the relative frequencies are 50%-50%

  2. A guess that the relative frequencies are 50%-50% after having drawn 1,000 balls and finding that the relative frequencies of the colors of balls drawn are about 50%-50%

saying

The person who offers odds of two to one on the first ball is not at all out of his mind in the same sense as the person who offers two to one odds on the 1001st ball.

A single probability estimate does not pick up on how much one should update in response to incoming evidence. If one assigns a probability p to an event, one might mentally categorize the event in the reference class “events with probability p” and update too little or too much in response to incoming evidence on account of anchoring on other events of probability p (for which the probability is more robustly established or less robustly established than for the event in question).

This may be addressed by replacing a subjective probability of an event with a probability distribution for an event: for each number p between 0 and 1, associating a probability qp that the event occurs with probability p. Quoting page 67 of Handbook of Risk Theory

Multivalued measures generally take the form of a function that assigns a numerical value to each probability value between 0 and 1. This value represents the degree of reliability or plausibility of each particular probability value. Several interpretations of the measure have been used in the literature, for example, second-order probability (Baron 1987; Skyrms 1980), fuzzy set membership (Unwin 1986; Dubois and Prade 1988), and epistemic reliability (Gardenfors and Sahlin 1982). See Moller et al. (2006) for an overview.

Probability, knowledge, and meta-probability discusses E.T. Jaynes’ approach to this.

Suppression of dependency of events

Given two events A and B to which one assigns probabilities p and q, the numbers p and q do not suffice to determine the probability that events A and B both occur. If one assigns probabilities to events, and forgets where the probabilities came from, there’s a risk of tacitly assuming that the events are independent, and assigning probability pq to the conjunction of p and q, when the probability of the conjunction could be much higher or much lower. According to chapter 1 of Nate Silver’s book The Signal and the Noise, similar mistakes contributed to the 2008 financial crisis: people in finance assigned a much smaller probability of a very large number of houses’ prices dropping than they did to a smaller number of houses’ prices dropping, even though the prices of different houses were correlated.

Conclusion

While some people have said that subjective probabilities of arbitrary events are not meaningful, there are definitions that make the notion of subjective probability meaningful, though arguably only as an intervals rather than as numbers. Using intervals rather than numbers addresses some of the objections that have been raised.

A large part of the debate about whether one should assign subjective probabilities to arbitrary events is perhaps best conceptualized as a debate about how large the probability intervals that one assigns should be. In Worst Case Scenarios (pg 160) Sunstein wrote

Suppose that the question is the likelihood that at least 100 million human beings will be alive in 10,000 years. For most people equipped with the knowledge they have, no probability can sensibly be assigned. Perhaps uncertainty is not unlimited; the likelihood can reasonably be described as above 0 percent and below 100 percent. But beyond that point, little can be said.

In any given instance, one has the question of how much can be said. If you have a model of the world M that’s accurate with probability at least p and M predicts an event E with probability at least q, then the probability of E is at least pq. If p is low, then this doesn’t give a good lower bound on the probability of E. But suppose you have 2 independent models M1, and M2, where Mi is accurate with probability at least pi and where Mi predicts E with probability at least qi. Then the probability of E is bounded below by p1q1 + p2q2 - p1q1p2q2. So by using model combination you can get a better lower bound on the probability of E (although in practice the models used may not be fully independent, and if they’re positively correlated then the lower bound will be worse).

The ways in which assigning subjective probabilities can be bad for one’s epistemology seem to fall under the broad heading “failing to incorporate all of one’s knowledge when assigning a probability and then using it uncritically, or forgetting that the probability that you assign to an event does not fully capture your knowledge pertaining to the event.” These issues can be at least partially mitigated by keeping them in mind.