Implementing Decision Theory
I’m implementing decision theory, in Python. There are three parts to this:
An XML language for formally specifying decision theory dilemmas. It can express many dilemmas: Newcomb, Transparent Newcomb, Parfit’s Hitchhiker, Death in Damascus, Smoking Lesion, Cosmic Ray, symmetric Prisoner’s Dilemma, Counterfactual Mugging, and Sleeping Beautfy.
Python code that implements CDT, EDT, and UDT.
A simulator in Python that runs a dilemma using a decision theory to resolve choices. The decision theories all log their reasoning, so you can see the calculations they make in (often excruciating) detail. It’s in Python because that’s a language lots of people know.
This is all available on Github. My hope is that it can be a launching point to explore and formalize decision theory.
But all is not well! All of the above is implemented, though sometimes poorly or even incorrectly. I could use some guidance. Some questions, if anyone has answers:
How can the Sleeping Beauty problem be expressed using a Bayesian Network? Imagine that Beauty is given a chance to bet each time she is woken up: there should be a node for “how she bets on Tuesday”, but only if the coin lands on tails. I don’t know how this would be expressed nicely with Bayesian Networks, and this is preventing me from trying to represent dilemmas using them.
I’m representing dilemmas as a tree of possible events, where edges are labeled with things like “probability 50%” and “Alice chooses ‘defect’”. Thus, to represent two coin flips, you would have three nodes: one for the first coin flip, one for the second coin flip if the first landed heads, and one for the second coin flip if the first landed tails. This stands in contrast to Bayesian networks, which would represent two coin flips with just two nodes. My intuition is that it should be possible to convert between the tree representation and the Bayesian network representation, if you’re provided with an equivalence relation between nodes in the tree representation (e.g. saying that the two “second coin flip” nodes are equivalent). Is this a thing? Has it been studied?
I called one of the decision theories I implemented “UDT”, but it’s actually what I would term “pre-commitment decision theory”: “act in the way you would have pre-committed to act”. Is there a name for that decision theory? I suspect that it’s indistinguishable from UDT and FDT in the space of single-agent dilemmas expressible using the XML language.
I’m not happy with the current representation of Smoking Lesion and Cosmic Ray. How would you represent dilemmas like these? Remember that you need both a way of writing down the problem, and EDT code that takes that written representation and uses it to do poorly on a dilemma.
How does one correctly handle multi-agent dilemmas, in which you know the other agents follow the same decision theory? My implementation of “UDT” defects in a prisoner’s dilemma against an agent that it knows is following the same decision procedure. More precisely: Alice and Bob follow the same decision procedure, and they both know it. Alice will choose between cooperate/defect, then Bob will choose between cooperate/defect without knowing what Alice picked, then the utility will be delivered. My “UDT” decision procedure reasons as follows for Alice: “if I had pre-commited to cooperate, then Bob would know that, so he would defect, therefore I defect”. Is there a known way out of this, besides special casing symmetric dilemmas, which is brittle? The FDT paper said something about “policies” as a way out of this. I have two ideas, which could both plausibly be referred to as ‘policies’: (i) when picking your action for a decision, you actually pick a set of actions for all agents and decisions, and use game theory to make it a satisfactory one for all agents; or (ii) when picking your action for a decision, you don’t actually pick an action, you pick a mapping from what the other agent would do to action.
How valuable is this? I think the decision theorists here are the main audience. I’m finding myself writing math now, and can imagine writing proofs within this framework. But it can only talk about dilemmas expressible within the XML-dilemma-language. I can imagine someone arguing that the interesting questions all lie outside of that.
Some things that surprised me:
CDT and EDT aren’t well defined, in the sense that their behavior depends on their prior over what their behavior is. This was mentioned in the FDT paper, but it didn’t really hit home until I was coding them and need to pick a prior.
Decision theory is complicated. The logs for the infallible Newcomb problem are ~100 lines long for all the decision theories, and none of that is extraneous (verbose, yes, but not extraneous). There’s just a lot of “what I should do depends on whether box B is full, which depends on what the predictor predicted I would do, which depends on …”.
The space of dilemmas is… smaller than I would have expected? I thought I’d be able to express a few of them, and then each new one would need an extension in the dilemma language. Instead, the dilemma language is very small.
My view of decision theory now is that it’s all about fixpoints. You solve some big equation, and inside of it is the same equation, and there are multiple fixpoint solutions, and you pick the (hopefully unique) best one.
Again, more detail is available in the Github repo README.
Looks very interesting, I’ll make sure to check out the git repo! Thanks for developing that!
As you’re perhaps already aware, (Everitt, Leike & Hutter 2015) comes with a jupyter notebook that implements EDT and CDT in sequential decision problems. Perhaps useful as a comparison or a source of inspiration.
Would you say that this is similar to the connection that exists between fixed points and Nash equilibria?
I was not aware of Everitt, Leike & Hutter 2015, thank you for the reference! I only delved into decision theory a few weeks ago, so I haven’t read that much yet.
Nash equilibria come from the fact that your action depends on your opponent’s action, which depends on your action. When you assume that each player will greedily change their action if it improves their utility, the Nash equilibria are the fixpoints at which no player changes their action.
In single-agent decision theory problems, your (best) action depends on the situation you’re in, which depends on what someone predicted your action would be, which (effectively) depends on your action.
If there’s a deeper connection than this, I don’t know it. There’s a fundamental difference between the two cases, I think, because a Nash equilibrium involves multiple agents that don’t know each others’ decision process (problem statement: maximize the outputs of two functions independently), while single-agent decision theory involves just one agent (problem statement: maximize the output of one function).
I’d say that the connection is: Single-agent problems with predictors can be interpreted as sequential two-player games where the (perfect) predictor is a player who observes the action of the decision-maker and best-responds to it. In game-theoretic jargon, the predictor is a Stackelberg follower, and the decision-maker is the Stackelberg leader. (Related: (Kovarik, Oesterheld & Conitzer 2023))
What’s the utility function of the predictor? Is there necessarily a utility function for the predictor such that the predictor’s behavior (which is arbitrary) corresponds to maximizing its own utility? (Perhaps this is mentioned in the paper, which I’ll look at.)
EDIT: do you mean to reduce a 2-player game to a single-agent decision problem, instead of vice-versa?
[Apologies for the delay]
You’re right, the predictor’s behavior might not be compatible with utility maximization against any beliefs. I guess we’re often interested in cases where we can think of the predictor as an agent. The predictor’s behavior might be irrational in the restrictive above sense,[1] but to the extent that we think of it as an agent, my guess is that we can still get away with using a game theoretic-flavored approach.
For instance, if the predictor is unaware of some crucial hypothesis, or applies mild optimization rather than expected value maximization
FWIW we implemented the FDT, CDT, and EDT in Haskell a while ago.
https://github.com/DecisionTheory/DecisionTheory
Oh, excellent!
It’s a little hard to tell from the lack of docs, but you’re modelling dilemmas with Bayesian networks? I considered that, but wasn’t sure how to express Sleeping Beauty nicely, whereas it’s easy to express (and gives the right answers) in my tree-shaped dilemmas. Have you tried to express Sleeping Beauty?
And have you tried to express a dilemma like smoking lesion where the action that an agent takes is not the action their decision theory tells them to take? My guess is that this would be as easy as having a chain of two probabilistic events, where the first one is what the decision theory says to do and the second one is what the agent actually does, but I don’t see any of this kind of dilemma in your test cases.
My solution, which assumes computation is expensive, is to reason about other agents based on their behavior towards a simplified-model third agent; the simplest possible version of this is “Defect against bots who defect against cooperate-bot, otherwise cooperate” (and this seems relatively close to how humans operate—we don’t like people who defect against the innocent).
Ah, so I’m interested in normative decision theory: how one should ideally behave to maximize their own utility. This is what e.g. UDT&FDT are aiming for. (Keep in mind that “your own utility” can, and should, often include other people’s utility too.)
Minimizing runtime is not at all a goal. I think the runtime of the decision theories I implemented is something like doubly exponential in the number of steps of the simulation (the number of events in the simulation is exponential in its duration; each decision typically involves running the simulation using a trivial decision theory).
That’s an interesting approach I hadn’t considered. While I don’t care about efficiency in the “how fast does it run” sense, I do care about efficiency in the “does it terminate” sense, and that approach has the advantage of terminating.
You’re doing to defect against UDT/FDT then. They defect against cooperate-bot. You’re thinking it’s bad to defect against cooperate-bot, because you have empathy for the other person. But I suspect you didn’t account for that empathy in your utility function in the payoff matrix, and that if you do, you’ll find that you’re not actually in a prisoner’s dilemma in the game-theory sense. There was a good SlateStarCodex post about this that I can’t find.
Evolution gave us “empathy for the other person”, and evolution is a reasonable proxy for a perfectly selfish utility machine, which is probably good evidence that this might be an optimal solution to the game theory problem. (Note: Not -the- optimal solution, but -an- optimal solution, in an ecosystem of optimal solutions.)
If you think evolution has a utility function, and that it’s the SAME function that an agent formed by an evolutionary process has, you’re not likely to get me to follow you down any experimental or reasoning path. And if you think this utility function is “perfectly selfish”, you’ve got EVEN MORE work cut out in defining terms, because those just don’t mean what I think you want them to.
Empathy as a heuristic to enable cooperation is easy to understand, but when normatively modeling things, you have to deconstruct the heuristics to actual goals and strategies.
Take a step back and try rereading what I wrote in a charitable light, because it appears you have completely misconstrued what I was saying.
A major part of the “cooperation” involved here is in being able to cooperate with yourself. In an environment with a well-mixed group of bots each employing differing strategies, and some kind of reproductive rule (if you have 100 utility, say, spawn a copy of yourself), Cooperate-bots are unlikely to be terribly prolific; they lose out against many other bots.
In such an environment, a strategem of defecting against bots that defect against cooperate-bot is a -cheap- mechanism of coordination; you can coordinate with other “Selfish Altruist” bots, and cooperate with them, but you don’t take a whole lot of hits from failing to edit: defect against cooperate-bot. Additionally, you’re unlikely to run up against very many bots that cooperate with cooperate-bot, but defect against you. As a coordination strategy, it is therefore inexpensive.
And if “computation time” is considered as an expense against utility, which I think reasonably should be the case, you’re doing a relatively good job minimizing this; you have to perform exactly one prediction of what another bot will do. I did mention this was a factor.