Dutch Books and Decision Theory: An Introduction to a Long Conversation
For a community that endorses Bayesian epistemology we have had surprisingly few discussions about the most famous Bayesian contribution to epistemology: the Dutch Book arguments. In this post I present the arguments, but it is far from clear yet what the right way to interpret them is or even if they prove what they set out to. The Dutch Book arguments attempt to justify the Bayesian approach to science and belief; I will also suggest that any successful Dutch Book defense of Bayesianism cannot be disentangled from decision theory. But mostly this post is to introduce people to the argument and to get people thinking about a solution. The literature is scant enough that it is plausible people here could actually make genuine progress, especially since the problem is related to decision theory.1
Bayesianism fits together. Like a well-tailored jacket it feels comfortable and looks good. It’s an appealing, functional aesthetic for those with cultivated epistemic taste. But sleekness is not a rigourous justification and so we should ask: why must the rational agent adopt the axioms of probability as conditions for her degrees of belief? Further, why should agents accept the principle conditionalization as a rule of inference? These are the questions the Dutch Book arguments try to answer.
The arguments begin with an assumption about the connection between degrees of belief and willingness to wager. An agent with degree of belief b in hypothesis h is assumed to be willing to buy wager up to and including $b in a unit wager on h and sell a unit wager on h down to and including $b. For example, if my degree of belief that I can drink ten eggnogs without passing out is .3 I am willing to bet $0.30 on the proposition that I can drink the nog without passing out when the stakes of the bet are $1. Call this the Will-to-wager Assumption. As we will see it is problematic.
The Synchronic Dutch Book Argument
Now consider what happens if my degree of belief that I can drink the eggnog is .3 and my degree of belief that I will pass out before I finish is .75. Given the Will-to-wager assumption my friend can construct a series of wagers that guarantee I will lose money. My friend could offer me a wager on b where I pay $0.30 for $1.00 stakes if I finish the eggnog. He could simultaneously offer me a bet where I pay $0.75 for $1.00 stakes if pass out. Now if I down the eggnog I win $0.70 from the first bet but I lose $0.75 from the second bet, netting me -$0.05. If I pass out I lose the $0.30 from the first bet, but win $0.25 from the second bet, netting me -$0.05. In gambling terminology these lose-lose bets are called a Dutch book. What’s cool about this is that violating the axioms of probability is a necessary and sufficient condition for degrees of belief to be susceptible to Dutch books, as in the above example. This is quite easy to see but the reader is welcome to pursue formal proofs: representing degrees of belief with only positive numbers, setting b(all outcomes)=1, and making b additive makes it impossible to construct a Dutch book. A violation of any axiom allows the sum of all b in the sample space to be greater than or less than 1, enabling a Dutch book.
The Diachronic Dutch Book Argument
What about conditionalization? Why must a rational agent believe h1 at b(h1|h2) once she learns h2? For this we update the Will-to-wager assumption to have it govern degrees of belief for hypothesis conditional on other hypotheses. An agent with degree of belief b in hypothesis h1|h2 is assumed to be willing to wager up to and including $b in a unit wager on h1 conditional on h2. This is a wager that is canceled if h2 turns out false but pays out if h2 turns out true. Say I believe with b=0.9 that I will finish ten drinks if we decide to drink cider instead of eggnog. Say I also believe with b=0.5 that we will drink cider and 0.5 that we drink eggnog. But say I *don’t* update my beliefs according to the principle of conditionalization. Once I learn that we will drink cider my belief that I will finish the drinks is only b=0.7. Given the Will-to-wager assumption I accept the following wagers.
(1) An unconditional wager on h2 (that we drink cider not eggnog) that pays $0.20 if h2 is true at b(h2)=0.5*$0.20= $0.10
(2) A unit wager on h1 (finishing ten drinks) conditional on h2 that pays $1.00 at b(h1|h2)=0.9*$1.00= $0.90
If h2 is false I lose $0.10 on wager (1). If h2 is true I win $0.10. But now I’m looking at all that cider and not feeling so good. I decide that my degree of belief that I will finish those ten ciders is only b=0.7. So my buys from me an unconditional wager (3) on h1 that pays $1.00 at b(h1)=0.7*$1.00=$0.7.
Then we start our drinking. If I finish the cider I gain $0.10 from wager (2) which puts me up $0.20, but then I lose $0.30 on wager (3) and I’m down $0.10 on the day. If I don’t finish that cider I win $0.70 from wager (3) which puts me at $0.80 until I have to pay out on wager (2) and go down to -$0.10 on the day.
Note again that any update in degree of belief in any hypothesis h upon learning evidence e that doesn’t equal b(h|e) is vulnerable to a Diachronic Dutch booking.
The Will-to-wager Assumption or Just What Does This Prove, Anyway?
We might want to take the above arguments literally and say they show not treating your degrees of belief like probabilities is liable to lead you into lose-lose wagers. But this would be a very dumb argument: there is no reason for anyone to actually make wagers in this manner. These are wagers which have zero expected gain and which presumably involve transaction costs. No rational person would make these wagers according to the Will-to-wager assumption. Second, the argument presented above uses money and as we are all familiar, money has diminishing return. You probably shouldn’t bet $100 for a one in a million shot at $100,000,000 because a hundred million dollars is probably not a million times more useful than a hundred dollars. Third, the argument assumes a rational person must want to win bets. A person might enjoy the wager even if the odds aren’t good or might prefer life without the money.
Nonetheless, the Will-to-wager Assumption doesn’t feel arbitrary, it just isn’t clear what it implies. There are a couple different strategies we might pursue to improve this argument. First, we can improve the Will-to-wager assumption and corresponding Dutch book theorems by making them about utility instead of money.
We start by defining a utility function, υ: X→R where X is the set of outcomes and R is the set of real numbers. A rational agent is one that acts to maximize R according to their utility function. An agent with degree of belief b in hypothesis h is assumed to be willing to wager up to and including b(util) in a one unil wager on h. As a literal ascription of willingness to wager this interpretation still doesn’t make sense. But we can think of the wagers here as general stand-ins for decisions made under uncertainty. The Will-to-Wager assumption fails to work when taken literally because in real life we can always decline wagers. But we can take every decision we make as a forced selection of a set of wagers from an imaginary bookie that doesn’t charge a vig, pays out in utility whether you live or die. The Bookie sometimes offers a large, perhaps infinite selection of sets of wagers to pick from and sometimes offers only a handful. The agent can choose one and only one set at a time. Agents have little control over what wagers get offered to them but in many cases one set will clearly be better than the others. But the more an agent’s treatment of her beliefs diverges from the laws of probability the more often she’s going to get bilked by the imaginary bookie. In other words, the key might be to transform the Dutch Book arguments into decision theory problems. These problems would hopefully demonstrate that non-Bayesian reasoning creates a class of decision problem which the agent always answers sub-optimally or inconsistently. 2
A possible downside to the above strategy is that it leaves rationality entangled with utility. There have been some attempts to rewrite the Dutch Book arguments to remove the aspects of utility and preference embedded in them. The main problem with these strategies is that they tend to either fail to remove all notions of preference3 or have to introduce some kind of apparatus that already resembles probability for no particular reason.4,5 Our conception of utility is in a Goldilocks spot- it has exactly what we need to make sense of probability while also being something we’re familiar with, we don’t have to invent it whole cloth. We might also ask a further question: why should beliefs come in degrees. The fact that our utility function (such as humans have one) seems to consist of real numbers and isn’t binary (for example) might explain why. You don’t need degrees of belief if all but one possible decision are always of value 0. In discussions here many of us have also been given to concluding that probability was epiphenomenal to optimum decision making. Obviously if we believe that we’re going to want a Dutch book argument that includes utility. Moreover, any successful reduction of degrees of belief to some decision theoretic measure would benefit from a set of Dutch book arguments that left out degrees of belief altogether.
As you can see, I think a successful Dutch book will probably keep probability intertwined with decision theory, but since this is our first encounter with the topic: have at it. Use this thread to generate some hypotheses, both for decision theoretic approaches and approaches that leave out utility.
1 This post can also be thought of as an introduction to basic material and a post accompanying “What is Bayesianism”.
2 I have some more specific ideas for how to do this, but can’t well present everything in this post and I’d like to see if others come up with the similar answers. Remember: discuss a problem exhaustively before coming to a conclusion. I hope people will try to work out their own versions, here in the comments or in new posts. It is also interesting to examine what kinds of utility functions can yield Dutch books- consider what happens for example when the utility function is strictly deontological where every decision consists of a 1 for one option and a 0 for all the others. I also worry that some of the novel decision theories suggested here might have some Dutch book issues. In cases like the Sleeping Beauty problem where the payoff structure is underdetermined things get weird. It looks like this is discussed in “When Betting Odds and Credences Come Apart” by Bradley and Leitgeb. I haven’t read it yet though.
3 See Howson and Urbach, “Scientific Reasoning, the Bayesian Approach” as an example.
4 Helman, “Bayes and Beyond”.
5 For a good summary of these problems see Maher, “Depragmatizing Dutch Book Arguments” where he refutes such attempts. Maher has his own justification for Bayesian Epistemology which isn’t a Dutch Book argument (it uses Representation theory, which I don’t really understand) and which isn’t available online that I can find. This was published in his book “Betting on Theories” which I haven’t read yet. This looks pretty important so I’ve reserved the book, if someone is looking for work to do, dig into this.
- Bet or update: fixing the will-to-wager assumption by 7 Jun 2017 15:03 UTC; 62 points) (
- (Subjective Bayesianism vs. Frequentism) VS. Formalism by 26 Nov 2011 5:05 UTC; 32 points) (
- Aligning AI by optimizing for “wisdom” by 27 Jun 2023 15:20 UTC; 27 points) (
- 21 Dec 2010 5:03 UTC; 9 points) 's comment on Karma Motivation Thread by (
- 30 Sep 2012 5:43 UTC; 0 points) 's comment on Peter Singer and Tyler Cowen transcript by (
I have a question about Dutch books. If I give you a finite table of probabilities, is there a polynomial time algorithm that will verify that it is or is not Dutch bookable? Or: help me make this question better-posed.
It reminds me of Boolean satisfiability, which is known to be NP complete, but maybe the similarity is superficial.
On the assumption that the known probability is the implied payoff (in reverse of betting, where the known payoff is assumed to be the implied probability of the bet) you can check for a dutch-book by summing the probabilities. Above one and it books the gambler (who will probably not buy it), below one and it books the bookmaker.
This is because Dutch books have to be profitable over every possible outcome. There is a procedure called dutching which is very similar, except it doesn’t guarantee a profit; it just forms a Dutch book over a restricted set of bets. This is no longer exhaustive, so there are outcomes in dutching where all of your bets fail and you lose money.
I am not sure what changes if the payoff and the probability of paying off are not equivalent.
Yes. You can do the same for a diachronic Dutch book which takes the table of probabilities that describes an agents beliefs before the agent learns E and after the agent learns E. For all H in table two p(H) must = p(H|E) in table 1. If p(H) does not = p(H|E) the the agent these tables describe is Dutch bookable assuming she will wager at those probabilities.
It would seem that something is “Dutch Bookable” so long as the sum of probabilities doesn’t add up to 1, which should not be a very difficult task at all.
I’m hoping this helps you clarify the question, since I feel like this answer probably doesn’t actually address your intent :)
Well, depends on if the probabilities overlap. So:
P(A)=.5 P(A&B)=.1 P(&~B)=.2
is Dutch-Bookable
It seems closer to the solvability of a system of linear equations. Depends on what kind of probabilities you get? Like if you have
P(A), p(B), p(C), p(A&B)=P(B&C)=P(A&C)=0, it’s trivial.
But if you have
P()=1/4
then you’ve got trouble.
Everyone’s of course right. But it means I don’t see a place for my train of thought to go.
Great question, I think I’ll work on it. I agree with Eliezer—my intuition tells me that there exists some algorithm that does this in polynomial time in the size of the table… but I’m also curious how many bets the Dutch book has to have, in general (as a function of the number of branching points and the number of choices at each update).
If we let the probabilities go to 0 and 1, doesn’t this problem just become SAT?
EDIT: As it turns out, no, no it doesn’t.
I expect an algorithm polynomial in the size of the table would exist.
It’s NP-hard. Here’s a reduction from the complement problem of 3SAT: let’s say you have n clauses of the form (p and not-q and r), i.e., conjunctions of 3 positive or negated atoms. Offer bets on each clause that cost 1 and pay n+1. The whole book is Dutch iff the disjunction of all the clauses is a propositional tautology.
I’ve written some speculations about what this might mean. The tentative title is “Against the possibility of a formal account of rationality”:
http://cs.stanford.edu/people/slingamn/philosophy/against_rationality/against_rationality.pdf
I really like the Less Wrong community’s exposition of Bayesianism so I’d be delighted to have feedback!
Hm… if your probability assignments are conjunctions of that form, is it still true that finding a Dutch book is polynomial in the size of the probability table that would be required to store the entire joint probability distribution corresponding to every possible assignment of all atoms? I.e., NP-hard in the number of conjunctions, but polynomial in the size of the entire probability distribution?
Interesting. I’m actually not sure. The general result by Paris I cited is a little unclear. He proves CONSISTENCY (consistency of a set of personal probability statements) to be NP-complete. First he gets SAT \leq_P CONSISTENCY, but SAT is only O(2^n) in the number of atoms, not in the number of constraints. However, the corresponding positive result, that CONSISTENCY is in NP, is proven using an algorithm whose running time depends on the whole length of the input.
It could be that if you have the whole table in front of you, checking consistency is just checking that all the rows and columns sum to 1.
However, I don’t think that looking at the complete joint distribution is the right formalization of the problem. For example, I have beliefs about 100 propositions, but it doesn’t seem like I have 2^100 beliefs about the probabilities that they co-occur.
Yes, a complete probability table is coherent iff all entries sum to 1. But what do you mean by “the” complete probability table corresponding to a given set of constraints? There’s often more than one such table.
Oh, thanks, you’re completely right.
To unify all the language and make things explicit: if you have n atoms, then there are 2^n possible states of the world (truth assignments to the atoms). Then, if you have a personal probability for each of the 2^n states (“complete joint distribution”, “complete table”), you can check consistency by summing them and seeing that you get 1. This is O(n) in the size of the table.
The question at stake seems to be something like this: does the agent legitimately have access to her (exponentially large) complete joint distribution? Or does she only have access to personal probabilities for a small number of statements (for example, a few conjunctions of atoms)? In the second case, there may be no complete joint distribution corresponding to her personal probabilities (if she’s inconsistent), exactly one (if the joint distribution is completely specified, possibly implicitly via independence assumptions that uniquely determine it), or infinitely many.
Can I just take a second to boast about this.
Here’s a paper that I think you’ll find interesting.
I’m not very familiar with the literature, but just off the top my head:
It seems to me that the force of ‘Dutch book’ type arguments can easily be salvaged from criticisms such as these.
OK. Rather than ask the subject whether they would make this wager, ask them, “if you were forced to accept one side of the wager or the other, which side would you prefer?” (Or “for which value of b would you become indifferent?”)
Yes, but if the amounts of money are small enough relative to the subject’s “bankroll” then it’s a ‘good enough’ approximation. If necessary you could just arbitrarily stipulate that each person is going to be given an extra million bucks simply for taking part, but keep the stakes of the actual wager very low—just a few dollars.
(EDIT: I have removed a paragraph which was confused.)
I remembering that in another discussion where the diminshing return of money was given as an issue with a thought experiment, somebody suggested that you could eliminate the effect by just stipulating that (in the case of a wager) your winnings will be donated to a highly efficient charity that feeds starving children or something.
My own thoughts: If the amount of money involved is so small that it would be worthless to any charity, just multiply everything by a sufficiently large constant. If you run out of starving children, start researching a cure for cancer, and once that’s cured you can start in on another disease, etc. Once all problems are solved, we can assume that the standard of living can be improved indefinitely.
Hm. I’d been meaning to ask about this apparent circularity in the foundations for a bit, and now this tells me the answer is “we don’t know yet”.
(Specifically: VNM proves the analogue of the “will-to-wager” assumption, but of course it assumes our usual notion of probability. Meanwhile Dutch book argument proves that you need our usual notion of probability—assuming the notion of utility! I guess we can say these at least demonstrate the two are equivalent in some sense. :P )
Savage’s representation theorem in Foundations of Statistics starts assuming neither. He just needs some axioms about preference over acts, some independence concepts and some pretty darn strong assumptions about the nature of events.
So it’s possible to do it without assuming a utility scale or a probability function.
I suppose this would be a good time to point anyone stumbling on this thread to my post that I later wrote on that theorem. :)
It seems like building a bet that only a Bayesian system wouldn’t get Dutch-booked on might be possible.
Hmm. What exactly do you mean?
Nevermind. I had it in mind that un-dutch-bookability was a property of perfect Bayesianism alone—so that in the way that all other methods are approximations of Bayes, plus or minus some constant, or restricted in some way, these departures from Bayes would open up non-Bayes systems to some incorrect calculation of probabilities, and so it would be possible to set up a situation where their calculation departs from Bayes in a systematic way, and then set up a Dutch book diachronically. It seems like you might be able to do this for frequentism, but it turns out that there are other, non-Bayes systems that are immune to Dutch books as well.
[Late night paraphrasing deleted as more misunderstanding/derailing than helpful. Edit left for honesty purposes. Hopeful more useful comment later.]
If you weaken your will-to-wager assumption and effectively allow your agents to offer bid-ask spreads on bets (i’ll buy bets on H for x, but sell them for y) then you get “Dutch book like” arguments that show that your beliefs conform to Dempster-Shafer belief functions, or Choquet capacities, depending on what other constraints you allow.
Or, if you allow that the world is non-classical – that the function that decides which propositions are true is not a classical logic valuation function – then you get similar results.
Other arguments for having probability theory be the right representation of belief include representation theorems of various kinds, Cox’s theorem, going straight from qualitative probability orderings, gradational accuracy style arguments…
I don’t share the intuition that our utility function “seems to consist of real numbers”. It seems to consist of ordinal numbers, at best: this is better than that, which is better than the other. “At best” because it’s not even clear that, for two outcomes neither of which I have ranked higher than the other, I’m generally able to say that I’m indifferent between them. Ambivalence is not necessarily indifference.
I think we should at least mention that there are other good arguments for why adopting the probability theory is a good idea. For example Cox’s theorem.
This seems to be orthogonal to the current argument. The Dutch book argument says that your will-to-wager fair betting prices for dollar stakes had better conform to the axioms of probability. Cox’s theorem says that your real-valued logic of plausible inference had better conform to the axioms of probability. So you need the extra step of saying that your betting behaviour should match up with your logic of plausible inference before the arguments support each other.
Rigorous formulation of dutch book arguments:
There is some set H of possible pre-existing states of the world, and the information contained within is hidden.
There is some set O of possible outcomes.
An action is a function from H to O.
A choice is a set of 2 or more actions, and an agent is a function from choices to actions within that choice. This is decision, stripped of any notion of probability or utility.
The dutch book arguments give you probability from utility. What we want to define is:
We choose a function $ from the reals to O (not the other way around, as a utility function would). This gives us a map from lotteries (functions from H to the reals) to actions (functions from H to O). $ is a valid currency if:
(We first need to note that a transitive agent must have a preference ordering)
One always prefers a lottery that has a greater value at every state of the world to one with a lesser value.
If X is a lottery, then $(X) is preferable to $(0), $(-X) is preferable to $(0), or they’re all equally preferable.
If X and Y are lotteries, then $(X+Y) is preferable to $(X) iff $(Y) is preferable to $(0)
You should be able to derive probabilities from any agent with a valid currency. The proof should also work if the domain of $ is only a dense subset of the reals, such as the rationals.
Every expected-utility maximizer with a sufficient variety of possible utilities has a valid currency.
So if an agent cares about wealth but has diminishing returns, the actual wealth will have increasing returns in the level of currency.
I wonder if this can stand in for the Maher?
Depragmatized Dutch Book Arguments
Yes, this is the article that covers several attempts to depragmatize the arguments. I highly recommend it. Unfortunately it doesn’t explain his own approach in any detail.
Edit: This contains a summary and a claimed refutation of Maher’s theory, but it isn’t complete enough to let me understand what Maher says.
How do you design a Dutch book to take advantage of someone whose estimations sum to less than one, instead of more?
Likewise, what does it look like to have negative b?
You buy the wagers from them instead of sell to them.
So if you have −4(h) you will sell me a unit wager on h for negative $4 (in other words you will pay me to take the bet). Perhaps I need to rephrase the Will-to-wager assumption differently to make this possibility more explicit? (Edit: I have done so)
Thanks for the revision.
The Will-to-wager assumption feels too strong for me. I would like, for instance, to be able to say “I will wager up to $0.30 on H, or up to $0.60 on ~H. Likewise, I will sell you a wager on H for $0.70 or more, and on ~H for $0.40 or more.”
Of course, that’s effectively setting up my own Dutch book, but it feels very natural to me to associate uncertainty in an outcome with “gaps” that I don’t want to commit to either hypothesis without more data. Then again, I’m a fan of DST, and that’s sort of the point.
I would say that the reason for your intuition that uncertainty ⇒ gaps (which seems separate from risk-aversion-induced gaps) is that the person on the other end of the bet may have information you don’t, and so them offering to bet you counts as Bayesian evidence that the side they’re betting on is correct.
However, e.g., a simple computer program can commit to not knowing anything about the world, and solve this problem.
Well, this is sound betting strategy. As I say, you shouldn’t take bets with 0 expected return unless you just enjoy gambling; it’s a waste of your time. The question we need to answer is whether or not this principle can be given a more abstract or idealized interpretation that says something important about why Bayesianism is rational- the argument certainly doesn’t prove that non-Bayesians are going to get bilked all the time.
I think this misses the point, somewhat. There are important norms on rational action that don’t apply only in the abstract case of the perfect bayesian reasoner. For example, some kinds of nonprobabilistic “bid/ask” betting strategies can be Dutch-booked and some can’t. So even if we don’t have point-valued will-to-wager values, there are still sensible and not sensible ways to decide what bets to take.
The question that needs answering isn’t “What bets do I take?” but “What is the justification for Bayesian epistemology?”.
What the Dutch book theorem gives you are restrictions on the kinds of will-to-wager numbers you can exhibit and still avoid sure loss. It’s a big leap to claim that these numbers perfectly reflect what your degrees of belief ought to be.
But that’s not really what’s at issue. The point I was making is that even among imperfect reasoners, there are better and worse ways to reason. We’ve sorted out the perfect case now. It’s been done to death. Let’s look at what kind of imperfect reasoning is best.
Yes. This was the subject of half the post.
It actually is what was at issue in this year old post and ensuing discussion. There is no consensus justification for Bayesian epistemology. If you would rather talk about imperfect reasoning strategies than the philosophical foundations of ideal reasoning than you should go ahead and write a post about it. It isn’t all that relevant as a reply to my comment.
I’d always thought “What bets do I take” was the justification for Bayesian epistemology. Every policy decision (every decision of any kind) is a statement of the form “I’m prepared to accept these costs to receive these outcomes given these events”, this is a bet. If Bayesian epistemology lets you win bets then that’s all the justification it could ever need.
The above discussion about “what bets do I take?” is about literal, monetary wager-making. The sense in which any decision can be described in a way that is equivalent to such a wager is precisely the question being discussed here.
I’m afraid I don’t understand what problem you are trying to solve here. How is what you want to accomplish different from what is done, for example, in Chapter 1 of Myerson?
Not having read that book I couldn’t really tell you how or even if what I want to accomplish is different. I’m introducing people to the central arguments of Bayesian epistemology, the right way to interpret those arguments being a matter of controversy in the field. It seems unlikely the matter is conclusively settled in this book, but if it is the Myerson’s point needs to be promoted and someone would do well to summarize it here. There are of course many books and articles that go into the matter deeper than I have here- if you are sufficiently familiar with the literature you may have been impressed with someone’s treatment of it even though the field has not developed a consensus on the matter. Can you explain Myerson’s?
ETA: I just found in on Google. Give me a minute.
Update: Myerson doesn’t mention the Dutch book arguments in the pages I have access to. I’ve just skimmed the chapter and I don’t see anything that obviously would provide a satisfactory interpretation of the Dutch book arguments. You’ll have to make it more explicit or give me time to get the full book and read closely.
Myerson gives an argument justifying probability theory, Bayesian updating, and expected utility maximization based on some plausible axioms about rational decision making.
As I understand it, Dutch book arguments are another way of justifying (some of) these results, but you are seeking ways of doing that justification without assuming that a rational decision maker has to function as a bookie—being willing to bet on either side of any question (receiving a small transaction fee). Decision theoretic arguments, which instead force the decision maker to choose one side or the other (while preserving transitivity of preferences), are an alternative to Dutch book arguments, are what Myerson provides, and are what I thought you were looking for. But apparently I was wrong.
So I repeat: I don’t understand what problem you are trying to solve here.
Again, I don’t have the book!
I realize there are many plausible ways of justifying these results, the vast majority of which I have never read and larges classes of which I may have never read. I was particularly interested in arguments in the Dutch book areaspace but I am of course interested in other ways of doing it. I’m trying to talk about the foundations of our epistemology, the most prominent of which appears to be these Dutch book arguments. I want to know if there is a good way to interpret them or revise them. If they are unsalvageable then I would like to know that. I am interested in alternative justifications and the degree to which they preserve the Dutch book argument’s structure and the degree to which they don’t. I haven’t given a specification of the problem. I’ve picked a concept which has some problems and suggested we talk about it and work on it.
So why don’t you just explain how Myerson’s argument works.
It is essentially the same as that of Anscombe and Aumann. Since that classic paper is available online, you can go straight to the source.
But the basic idea is straightforward and has been covered by Ramsey, Savage, von Neumann, Luce and Raiffa, and many others. The central assumptions are that preferences are transitive, together with something variously called “The sure thing principle” (Savage) or the “Axiom of Independence” (von Neumann).
Thanks for the link.
So that particular method (the one in the paper you link) has, to my mind, a rather troubling flaw: it bases subjective probability on so-called physical probability. I agree with what appears to be the dominant position here that all probabilities are subjective probabilities which makes the Anscombe and Aumann proof rather less interesting—in fact it is question begging. (though it does work as a way of getting from more certain “objective” probabilities to less certain probabilities). They say that most of the other attempts have not relied on this, so I guess I’ll have to look at some of those. I’m also not sure Anscombe and Aumann have in anyway motivated agents to treat degrees of belief as probability: they’ve just defined such a agent, not shown that such conditions are necessary and sufficient for that agent to be considered rational (I suppose an extended discussion of those central assumptions might do the the trick).
But yes, these arguments are somewhat on topic.
Jack, you might be more interested in the paper linked to in this post.
This is not as clear as it could be in your original post. It might be helpful for others if you add an introduction that explicitly says what your aim is.
Dear Perfect Bayesian,
What probability do you assign to Collatz conjecture being true? There’s plenty of other number theory statements I could ask about of course.
Do you understand how any definite answer other than 0 and 1 means you just got successfully Dutch booked?
A koan: Should Perfect Bayesian accept a bet that Perfect Bayesian will reject this bet?
The two are closely related.
Isn’t that like saying you got Dutch-booked for assigning 1⁄2 as the probability of heads (because a clairvoyant will be able to foresee that the coin is actually falling tails)?
The relevant point is that, in real life, computation requirements can keep you from calculating the exact Bayesian probability, which will lead to dutch-booking if an agent with enough computing power has a good model of the approximation you’re using.
Correct.
Not at all.
Collatz conjecture is true in every universe, or false in every universe.
You can slice it into a set of trivial statements which are trivially true or trivially false, like “Collatz conjecture is true for N=531” etc., connected by trivially true or trivially false logical statements.
There’s no way to meaningfully get any probability but 0 or 1 out of this, other than by claiming that some basic mathematical law is uncertain (and if you believe that, you are more Dutch bookable than entire Netherlands). I might not know how to Dutch book you yet, but logic dictates such a way exists.
Except thanks to Incompleteness Theorem, you have no way to find a definite answer to every such statement. No matter which strategy you choose, and how much time you have, you’ll either be inconsistent (Dutch bookable), or incomplete (not able to answer 0 or 1 - and as no other answer is valid, any answer you give makes you Dutch bookable).
Do you assign probability 1 to the proposition that 182547553 is prime? Right now, without doing an experiment on a calculator? (well, computer maybe. For most calculators testing this proposition would be somewhat tedious)
If yes, would you willing to pay me $10 if you ever found out it was not prime?
Conversely
Do you assign probability 0 to the proposition that 182547553 is prime? Right now, without doing an experiment on a calculator?
If yes, would you willing to pay me $10 if you ever found out it was prime?
EDIT: Actually, I suppose this counts as “doing it again”, even though I’m not Peter de Blanc. I think that makes me a bad bad person.
I suggest you look up the concept of “subjective Bayesian”. Probabilities are states of knowledge. If you don’t know an answer, it’s uncertain. If someone who doesn’t know anything you don’t can look over your odds and construct a knowably losing bet anyway, or construct a winning bet that you refuse, then you are Dutch-bookable.
Also, considering that you have apparently been reading this site for years and you still have not grasped the concept of subjective uncertainty, and you are still working with a frequentist notion of probability, nor yet have you even grasped the difference, I would suggest to you in all seriousness that you seek enlightenment elsewhere.
(Sorry, people, there’s got to be some point at which I can express that. Downvote if you must, I suppose, if you think such a concept is unspeakable or unallowed, but it’s not a judgment based on only one case of incomprehension, of course.)
I don’t know the background conflict. But at least one of taw’s points is correct. Any prior P, of any agent, has at least one of the following three properties:
It is not defined on all X—i.e. P is agnostic about some things
It has P(X) < P(X and Y) for at least one pair X and Y—i.e. P sometimes falls for the conjunction fallacy
It has P(X) = 1 for all mathematically provable statements X—i.e. P is an oracle.
You aren’t excused from having to pick one by rejecting frequentist theory.
To make use of a theory like probability one doesn’t have to have completely secure foundations. But it’s responsible to know what the foundational issues are. If you make a particularly dicey or weird application of probability theory, e.g. game theory with superintelligences, you should be prepared to explain (especially to yourself!) why you don’t expect those foundational issues to interfere with your argument.
About taw’s point in particular, I guess it’s possible that von Neumann gave a completely satisfactory solution when he was a teenager, or whatever, and that I’m just showcasing my ignorance. (I would dearly like to hear about this solution!) Otherwise your comment reads like you’re shooting the messenger.
Logical uncertainty is an open problem, of course (I attended a workshop on it once, and was surprised at how little progress has been made).
But so far as Dutch-booking goes, the obvious way out is 2 with the caveat that the probability distribution never has P(X) < (PX&Y) at the same time, i.e., you ask it P(X), it gives you an answer, you ask it P(X&Y), it gives you a higher answer, you ask it P(X) again and it now gives you an even higher answer from having updated its logical uncertainty upon seeing the new thought Y.
It is also clear from the above that taw does not comprehend the notion of subjective uncertainty, hence the notion of logical uncertainty.
Have any ideas?
Your full endorsement of evaporative cooling is quite disturbing.
Are you at least aware that epistemic position you’re promoting is highly contrarian?
How, exactly, to deal with logical uncertainty is an unsolved problem, no?
It’s not clear why it’s more of a problem for Bayesian than anyone else.
The post you linked to is a response to Eliezer arguing with Hanson, with Eliezer taking a pro-his-own-contrarianism stance. Of course he’s aware of that contrarianism.
Your choice is either accepting that you will be sometimes inconsistent, or accepting that you will sometimes answer “I don’t know” without providing a specific number, or both.
There’s nothing wrong with “I don’t know”.
For Perfect Bayesian or for Subjective Bayesian?
Subjective Bayesian does believe many statements of kind P(simple math step) = 1, P(X|conjunction of simple math steps) = 1, and yet P(X) < 1.
it does not believe math statements with probability 1 or 0 until it investigates them. As soon as it investigates whether (X|conjunction of simple math steps) is true and determines the answer, it sets P(X)=1.
The problem with “I don’t know” is that sometimes you have to make decisions. How do you propose to make decisions if you don’t know some relevant mathematical fact X?
For example, if you’re considering some kind of computer security system that is intended to last a long time, you really need an estimate for how likely it is that P=NP.
Then you need to fully accept that you will be inconsistent sometimes. And compartmentalize your belief system accordingly, or otherwise find a way to deal with these inconsistencies.
Well, they can wriggle out of this by denying P(simple math step) = 1, which is why I introduced this variation.
Doesn’t this imply you’d be willing to accept P(2+2=5) on good enough odds?
This might be pragmatically a reasonable thing to do, but if you accept that all math might be broken, you’ve already given up any hope of consistency.
If physics is deterministic then conditional on the state of the world at the time you make the bet, the probability of heads is either 0 or 1. The only disanalogy with your example is that you may not already have sufficient information to determine how the coin will land (which isn’t even a disanalogy if we assume that the person doesn’t know what the Collatz conjecture says). But suppose you did have that information—there would be vastly more of it than you could process in the time available, so it wouldn’t affect your probability assignments. (Note: The case where the Collatz conjecture turns out to be true but unprovable is analogous to the case where the laws of physics are deterministic but ‘uncomputable’ in some sense.)
Anyway, the real reason why I want to resist your line of argument here is due to Chaitin’s number “omega”, the “halting probability”. One can prove that the bits in the binary expansion of omega are algorithmically incompressible. More precisely: In order to deduce n bits’ worth of information about omega you need at least n—k bits’ worth of axioms, for some constant k. Hence, if you look sufficiently far along the binary expansion of omega, you find yourself looking at an infinite string of “random mathematical facts”. One “ought” to treat these numbers as having probability 1⁄2 of being 1 and 1⁄2 of being 0 (if playing games against opponents who lack oracle powers).
Deterministic world and “too much information to process” are uninteresting. All that simply means that due to practical constraints, sometimes the only reasonably thing is to assign no probability. As if we didn’t know that already. But probabilities still might be assignable in theory.
Except uncomputability means it won’t work even in theory. You are always Dutch bookable.
Chaitin’s number is not a mathematical entity—it’s creation of pure metaphysics.
The claim that kth bit of Chaitin’s number is 0 just doesn’t mean anything once k becomes big enough to include a procedure to compute Chaitin’s number.
Better: Sometimes the only reasonable thing is to assign a probability that’s strictly speaking “wrong”, but adequate if you’re only facing opponents who are (approximately) as hampered as you in terms of how much they know and much they can feasible compute. (E.g. Like humans playing poker, where the cards are only pseudo-random.)
If you want to say this is uninteresting, fine. I’m not trying to argue that it’s interesting.
Sorry, you’ve lost me.
Chaitin’s number is awfully tame by the standards of descriptive set theory. So what you’re really saying here is that you personally regard a whole branch of mathematics as “pure metaphysics”. Maybe a few philosophers of mathematics agree with you—I suspect most do not—but actual mathematicians will carry on studying mathematics regardless.
I’m not sure what you’re trying to say here but what you’ve actually written is false. Why do you think Chaitin’s number isn’t well defined?
For what it’s worth, I think you made a very interesting contribution to this thread, and I find it somewhat baffling that EY responded in the way he did (though perhaps there’s some ‘history’ here that I’m not aware of) and equally baffling that this has apparently caused others to downvote you.
No. I don’t understand that. Could you sketch the argument?
Incidentally, this may be playing with words, but the usual way of expressing the Dutch-book-victimhood of the hapless non-Bayesian agent is to say that a Dutch book can be constructed against him—not merely that a Dutch book exists. Successful gamblers engage in constructive, computable mathematics.
There is no hope for LessWrong as long as people keep conflating Perfect Bayesian and Subjective Bayesian.
Let’s take Subjective Bayesian first. The problem is—Subjective Bayesian breaks basic laws of probability as a routine matter.
Take the simplest possible law of probability P(X) >= P(X and Y).
Now let’s X be any mathematical theorem which you’re not sure about. 1 > P(X) > 0.
Let Y be some kind of “the following proof of X is correct”.
Verifying proofs is usually very simple, so very once you’re asked about P(X and Y), you can confidently reply P(X and Y) = 1. Y is not a new information about the world. It is usually conjunction of trivial statements which you already assigned probability 1.
That is—there’s infinite number of statements for which Subjective Bayesian will reply P(X) < P(X and Y).
For Subjective Bayesian X doesn’t even have to involve any infinities, just ask a simple question about cryptography which is pretty much guaranteed to be unsolvable before heat death of the universe, and you’re done.
At this point people far too often try to switch Perfect Bayesian for Subjective Bayesian.
And this is true, Perfect Bayesian wouldn’t make this particular mistake, and all probabilities of mathematical theorems he’d give would be 0 or 1, no exceptions. The problem is—Perfect Bayesians are not possible due to uncomputability.
If your version of Perfect Bayesian is computable, straightforward application of Rice Theorem shows he won’t be able to answer every question consistently.
If you claim some super-computable oracle version Perfect Bayesian—well, first that’s already metaphysics not mathematics, but in the end, this way of working around uncomputability does not work.
At any mention of uncomputability people far too often try to switch Subjective Bayesian for Perfect Bayesian (see Eliezer’s comment).
Excellent—now you’ve explained what you mean by “Perfect Bayesian” it all makes sense! (Though I can’t help thinking it would have saved time if you’d said this earlier.)
Still, I’m not keen on this redefinition of the word ‘metaphysics’, as though your philosophy of mathematics were ‘obviously correct’ ‘received wisdom’, when actually it’s highly contentious.
Anyway, I think this is a successful attack on a kind of “Bayesian absolutism” which claims that beings who (explicitly or implicitly) assign consistent probabilities to all expressible events, and update their beliefs in the Bayesian manner, can actually exist. That may be a straw man, though.
The obvious solution:
Y is information. It is not information about the world, but it is information—information about math.
I don’t think that works.
Imagine taking the proofs of all provable propositions that can be expressed in less than N characters (for some very large number N), writing them as conjunctions of trivial statements, then randomly arranging all of those trivial statements in one extremely long sequence.
Let Z be the conjunction of all these statements (randomly ordered as above).
Then Z is logically stronger than Y. But our subjective Bayesian cannot use Z to prove X (Finding Moby Dick in the Library of Babel is no easier than just writing Moby Dick oneself, and we’ve already assumed that our subject is unable to prove X under his own steam) whereas he can use Y to prove X.
The Bayesian doesn’t know Z is stronger than Y. He can’t even read all of Z. Or if you compress it, he can’t decompress it.
P(Y|Z)<1.
If you say that then you’re conceding the point, because Y is nothing other than the conjunction of a carefully chosen subset of the trivial statements comprising Z, re-ordered so as to give a proof that can easily be followed.
Figuring out how to reorder them requires mathematical knowledge, a special kind of knowledge that can be generated, not just through contact with the external world, but through spending computer cycles on it.
Yes. It’s therefore important to quantify how many computer cycles and other resources are involved in computing a prior. There is a souped-up version of taw’s argument along those lines: either P = NP, or else every prior that is computable in polynomial time will fall for the conjunction fallacy.
Imagine he has read and memorized all of Z.
If you want to make it a bit less unrealistic, imagine there are only, say, 1000 difficult proofs randomly chopped and spliced rather than a gazillion—but still too many for the subject to make head or tail of. Perhaps imagine them adding up to a book about the size of the Bible, which a person can memorize in its entirety given sufficient determination.
Oh I see. Chopped and spliced. That makes more sense. I missed that in your previous comment.
The Bayesian still does not know that Z implies Y, because he has not found Y in Z, so there still isn’t a problem.
In which sense is Y information?
It’s not even guaranteed to be true (but you can verify that yourself much more easily than X directly).
Compare this with classical result of conjunction fallacy. In experiments people routinely believed that:
P(next year the Soviet Union will invade Poland, and the United States will break off diplomatic relations with the Soviet Union) > P(next year the United States will break off diplomatic relations with the Soviet Union).
Here X=the United States will break off diplomatic relations with the Soviet Union. Y=the Soviet Union will invade Poland.
Wouldn’t your reasoning pretty much endorse what people were doing? (with Y—one possible scenario leading to X—being new information)
Hmmm, I now think the existence of Y is actually a distraction. The underlying process is:
produce estimate for P(X) ⇒ find proof of X ⇒ P(X) increases
If estimates are allowed to change in this manner, then of course they are allowed to change when someone else shows you a proof of X. (since P(X)=P(X) is also a law of probability) If they are not allowed to change in this manner, then subjective Bayesianism applied to mathematical laws collapses anyways.
From a purely human psychological perspective: When someone tells me a proof of a theorem, it feels like I’ve learned something. When I figure one out myself, it feels like I figured something out, as if I’d learned information through interacting with the natural world.
Are you meaning to tell me that no one learns anything in math class? Or they learn something, but the thing they learn isn’t information?
Caveat: Formalizing the concepts requires us to deviate from human experience sometimes. I don’t think the concept of information has been formalized, by Bayesians or Frequentists, in a way that deals with the problem of acting with limited computing time, aka the problem of logical uncertainty.
I think we almost agree already ;-)
Would you agree that “P(X)” you’re describing is more like “some person’s answer when asked question X” than “probability of X”?
The main difference between two is that if “X” and “Y” are the same logical outcome, then their probabilities are necessarily the same, but an actual person can reply differently depending on how question was formulated.
If you’re interested in this subject, I recommend reading about epistemic modal logic—not necessarily for their solutions, but they’re clearly aware of this problem, and can describe it better than me.
Ok, I understood that, but I still don’t see what it has to do with Dutch books.
P(X) < P(X and Y) gives you a dutch book.
OH I SEE. Revelation.
You can get pwn’d if the person offering you the bets knows more than you. The only defense is to, when you’re uncertain, have bets such that you will not take X or you will not take -X (EDIT: Because you update on the information that they’re offering you a bet. I forgot that. Thanks JG.). This can be conceptualized as saying “I don’t know”
So yeah. If you suspect that someone may know more math than you, don’t take their bets about math.
Now, it’s possible to have someone pre-commit to not knowing stuff about the world. But they can’t pre-commit to not knowing stuff about math, or they can’t as easily.
Another defense is updating on the information that this person who knows more than you is offering this bet before you decide if you will accept it.
That’s not the Dutch book I was talking about.
Let’s say you assign “X” probability of 50%, so you gladly take 60% bet against “X”.
But you assign “X and Y” probability 90%, so you as gladly take 80% bet for “X and Y”.
You just paid $1.20 for combinations of bets that can give you returns of at most $1 (or $0 if X turns out to be true but Y turns out to be false).
This is exactly a Dutch Book.
Given that they are presented at the same time (such as X is a conjecture, Y is a proof of the conjecture), yes, accepting these bets is being Dutch Booked. But upon seeing “X and Y” you would update “X” to something like 95%.
Given that they are presented in order (What bet do you take against X? Now that’s locked in, here is a proof Y. What bet do you take for “X and Y”?) this is a malady that all reasoners without complete information suffer from. I am not sure if that counts as a Dutch Book.
It is trivial to reformulate this problem to X and X’ being logically equivalent, but not immediately noticeable as such, and a person being asked about X’ and (X and Y) or something like that.
Yes, but that sounds like “If you don’t take the time to check your logical equivalencies, you will take Dutch Books”. This is that same malady: it’s called being wrong. That is not a case of Bayesianism being open to Dutch Books: it is a case of wrong people being open to Dutch Books.
You’re very wrong here.
By Goedel’s Incompleteness Theorem, there is no way to “take the time to check your logical equivalencies”. There are always things that are logically equivalent that your particular method of proving, no matter how sophisticated, will not find, in any amount of time.
This is somewhat specific to Bayesianism, as Bayesianism insists that you always give a definite numerical answer.
Not being able to refuse answering (by Bayesianism) + no guarantee of self-consistency (by Incompleteness) ⇒ Dutch booking
I admit defeat. When I am presented with enough unrefusable bets that incompleteness prevents me from realising are actually Dutch Books such that my utility falls consistently below some other method, I will swap to that method.
I’m curious.
pr(0.5) that Collatz conjecture is true. My belief is synchronic (all mutually exclusive and exhaustive outcomes sum to 1) and diachronic (I will perform Bayesian updates on this belief correctly). I see no way to Dutch book me.
As for the koan;
Let “bet Y” = “Should Perfect Bayesian (PB) accept a bet (bet X) that PB will reject this bet?”
The koan, then, is:
Should PB accept bet X that PB will reject bet Y?
For payoffs of X less than 1 (bet a dollar to win fifty cents, say), PB should not take bet X. For payoffs of X greater than 1 (bet a dollar to receive a dollar fifty, say) PB should take bet X and then reject bet Y. For even payoffs, the opportunity cost of epsilon tells PB not to take the bet.
The trouble is that if the Collatz conjecture is true then {”Collatz conjecture is true”} constitutes an exhaustive set of outcomes, whose probability you think is only 1⁄2.
I think the intended question is “Should PB accept bet X that PB will reject bet X?”
And if the Collatz conjecture is false then {”Collatz conjecture is false”} constitutes a single outcome, whose probability I think is 1⁄2. As of now I don’t know which premise (if true, if false) is actually the case, so I represent my uncertainty about the premises with a probability of 0.5. Representing uncertainty about an outcome (even an outcome that we would know if we were logically omniscient) does not make you dutch-bookable; incorrectly determining your uncertainty makes you dutch-bookable.
Even easier. No. “I’ll bet you a dollar you won’t take this bet.” “So if I take it, I lose a dollar, and if I don’t take it, I lose nothing? I reject the bet. Having rejected the bet, you will now proceed to point out that I would have won the bet. I will now proceed to show you that taking the bet closes off the path where I win the bet (it satisfies the failure condition), and rejecting the bet closes off the path where I win the bet (can’t win a bet you didn’t take), so my choices are between losing the bet and not taking the bet. Out of the only two options available to me, losing the bet costs me a dollar, not taking the bet costs me nothing. Quod erat demonstrandum.”
I am aware that my lack of logical omniscience is a problem for foundations of Bayesian epistemology (and, frankly, in the rest of my life too). The logical omniscience issue is another one I am interested in tackling, if you would like to write a post formulating that problem and maybe suggesting some paths for addressing it, by all means have at it. A set of ambiguous and somewhat rude questions directed at me or no one in particular is not a constructive way to discuss this issue.
I recommend starting here.
Lesswrongian-Bayesian P works as this kind of an epistemic modal operator, and has exactly the problems I mentioned.
It doesn’t. Not even remotely. You clearly have no understanding of the Dutch book concept.
And this demonstrates the problem I have with “koans”. They tend to be used to try to make complete nonsense sound deep. The troubling thing is that it quite often works. The parent is upvoted still at the time of my reply and it almost certainly wouldn’t be if not for koan silliness.