I’m not sure what to think of this paper, it’s quite long and I haven’t finished checking it for sanity. nevertheless, I noticed it hadn’t made its way here, and there are mighty few papers that cite the FDT paper, so I figured I’d drop it off rather than leave it sitting open in a tab forever.
Abstract:
Functional decision theory (FDT) is a fairly new mode of decision theory and a normative viewpoint on how an agent should maximize expected utility. The current standard in decision theory and computer science is causal decision theory (CDT), largely seen as superior to the main alternative evidential decision theory (EDT). These theories prescribe three distinct methods for maximizing utility. We explore how FDT differs from CDT and EDT, and what implications it has on the behavior of FDT agents and humans. It has been shown in previous research how FDT can outperform CDT and EDT. We additionally show FDT performing well on more classical game theory problems and argue for its extension to human problems to show that its potential for superiority is robust. We also make FDT more concrete by displaying it in an evolutionary environment, competing directly against other theories. All relevant code can be found here: https://github.com/noahtopper/FDT-in-an-Evolutionary-Environment.
If anyone has any thoughts on this paper in particular, I’d love to hear them.
The whole fascination with decision theory is a weird LW peculiarity. In mainstream ML/RL it seems nobody cares and EDT is just assumed—and ‘bayesian decision theory’ is just EDT, it is what AIXI uses etc. Why would you ever impose this additional constraint of physical causality? It seems that EDT’s simpler just pick the best predicted option dominates (and naturally the paper you linked uses an evolutionary algo to compare FDT only to the obviously inferior CDT, not to EDT).
The action you chose becomes evidence in the world conditioned on you choosing it, regardless of whether that is ‘causally possible’. If the urge to smoke and cancer are independently caused by a gene, then in the world where you choose to smoke, that choice is evidence of having the gene.
XOR Blackmail is (in my view) perhaps the clearest counterexample to EDT:
(Styling mine, not original.) EDT pays the $1,000 for nothing: it has absolutely no influence on whether or not the agent’s house is infested with termites.
I think the thing @jacob_cannell is imagining is not plain CDT, EDT, or FDT, and writing out what it is he’s imagining in the language of https://arxiv.org/abs/2307.10987 would clarify. I suspect the RL thing he’s imagining is some mix of CDT and EDT depending on the amount of experience the agent has with a context. He’d have to clarify. I bring this up because I anticipate any language model having the correct response to that example scenario, because it has experience with those dynamics in previous language, but it’ll be vulnerable to tweaked versions of that, and yet also behave CDTishly in some scenarios. these decision theories are “pure”, approximation-free models, and so approximation learning systems behave differently sometimes.
This just seems like a variant of newcomb’s box, and EDT is naturally optimal here (as it is everywhere).
Assume the predictor is never wrong and never lies. Then upon receiving the letter we know that in worlds where the house is not infested we pay, and in worlds where the house is infested we do not. So we pay and win $999,000, which is optimal.
Perfect predictors are roughly equivalent to time travel. Its equivalent to filtering out all universes where the house is not infected and we don’t pay, and all those where the house is infected and we pay.
To compare decision algos we need a formal utility measure for our purposes of comparison. Given any such formal utility measure, we could then easily define the optimal decision algorithm—it is whatever argmaxes that measure! EDT is simply that, for the very reasonable expected utiltiy metric.
Given that you receive the letter, paying is indeed evidence for not having termites and winning $999,000. EDT is elegant, but still can’t be correct in my view. I wish it were, and have attempted to “fix” it.
My take is this. Either you have the termite infestation, or you don’t.
Say you do. Then
being a “payer” means you never receive the letter, as both conditions are false. As you don’t receive the letter, you don’t actually pay, and lose the $1,000,000 in damages.
being a “non-payer” means you get the letter, and you don’t pay. You lose $1,000,000.
Say you don’t. Then
payer: you get the letter, pay $1,000. You lose $1,000.
non-payer: you don’t get the letter, and don’t pay $1,000. You lose nothing.
Being a payer has the same result when you do have the termites, but is worse when you don’t. So overall, it’s worse. Being a payer or a non-payer only influences whether or not you get the letter, and this view is more coherent with the intuition that you can’t possibly influence whether or not you have a termite infestation.
In your problem description you said you receive the letter:
Given that you did receive the letter, that eliminates 2 of the 4 possible worlds, and we are left with only (infested, dont_pay) and (uninfested, pay). Then the choice is obvious. EDT is correct here.
Obviously if you don’t receive the letter you have more options but then its not much of an interesting problem.
This intuition is actually false for perfect predictors. A perfect predictor could simulate your mind (along with everything else) perfectly, which is somewhat equivalent to time travel. Its not actual time travel of course; in these ‘perfect prediction’ scenarios your future (perfectly predicted) decisions have already effected your past.
“In your problem description you said you receive the letter”
True, but the problem description also specifies subjunctive dependence between the agent and the predictor. When the predictor made her prediction the letter isn’t yet sent. So the agent’s decision influences whether or not she gets the letter.
“This intuition is actually false for perfect predictors.”
I agree (and have written extensively on the subject). But it’s the prediction the agent influences, not the presence of the termite infestation.
The payoff and optimal move naturally depends on the exact time of measurement. Before receiving any letter you can save $1000 by precomitting to not paying: but that is a move both FDT and EDT will make. But after receiving the letter (which you assumed) the optimal move is to pay the $1000 to save $1M. FDT from my understanding fails here as it retroactively precommits to not paying and thus loses $1M. So this is a good example of where EDT > FDT.
The only example i’ve seen so far where the retroactive precommitment of FDT actually could make sense is the specific variant 5 from here where we measure utility before the agent knows the rules or has observed anything. And even in that scenario FDT only has a net advantage if it is optimal to make the universal precommitmment everywhere. EDT can decide to do that: EDT->FDT is allowed, but FDT can never switch back. So in that sense EDT is ‘dominant’, or the question reduces to: is the universal precommitment of FDT a win on net across the multiverse? Which is far from clear.
The trick with FDT is that FDT agents never receive the letter and never pay. FDT payoff is p*(-1000000), where p is a probability of infestation. EDT payoff is p*(-1000000) + (1-p)*(-1000), which seems to me speaking for itself.
The problem clearly states:
So that is baked into the environment, it is a fact. The EDT payoff is maximal.
For the same reason one models anything else using cause and effect.
“Cause and effect” is already subsumed by model based world prediction. Regardless—where is an example of a problem EDT does not handle correctly? It correctly one boxes etc
Sure, which is why it’s interesting to think about decision theories that can handle that. You can’t just assume EDT when you’re doing causally calibrated model-based prediction.
EDT correctly handles everything already:
V(A)=∑jP(Oj|A)U(Oj)
The expected utility of an action is simply the probability/measure of each possible future conditional on that action weighted by the utility of each such future universe—ie the expected value of the full sub branch stemming from the action. It simply can’t be anything else—that is the singular unique correct definition. Once you’ve written that out, you are done with ‘decision theory’; the hard part is in actually learning to predict the future with any accuracy using limited compute.
An agent is presented with a transparent box, which contains either $1 or $100. They have the option to open the box and take the money, or leave. Perfect predictor Omega had previously set up the box according to the following rule: if they predicted that the agent would take $1, they put in $1 with 99% probability, otherwise $100. If they predicted that the agent would leave $1, they put in $100 with 99% probability, otherwise they put in $1.
From the EDT point of view, there are two separate decision problems here, one for each amount that the agent sees in the box. The world model implicit in P(O_j|A) can’t depend upon how or why Omega put various amounts of money in the box, because the agent has already ruled out being in a world in which Omega put a different amount of money in the box.
Obviously it answers “take the money” for each. Over all universes then, 99% of EDT agents get $1 and 1% get $100, for an average performance of $1.99.
From the FDT point of view there are not two separate decision problems here, but optimization of a strategy mapping a 1-bit input (amount of money seen) to a 1-bit output (take or leave). The optimal function is to always leave $1 and always take $100. Then over all universes, 99% of FDT agents get $100 and 1% get nothing for an average performance of $99.
If we disallow commitments or self-modifications and measure utility after the transparent box is already observed, then absent additional considerations taking the $1 results in a $1 gain over not taking it.
But if we consider actions taken before observing the transparent box, then EDT can also precommit to always leaving $1 (ie it can take preaction to remove the ability to choose in the later action, which is the optimal move here)
The key is whether we allow the ability to make a binding precommitment (or equivalent self modification) action before the main decision. If so then EDT can (and will!) exploit that. FDT must rely on the same mechanism, so it has no advantage. If your response is “well that is what FDT is” then my response is that isn’t a new decision algorithm that disagrees with the fundamentals, it’s just a new type of implicit action allowed in these problems.
The main difference is that EDT quantifies over actions, while FDT quantifies over strategies that choose actions, when determining what action to take. In the end, they both tell you which action of the available actions you should take given an epistemic state. So yes, that is what FDT is, and it is different from EDT.
FDT does not require precommitment as an available action, since the decision theory itself tells you what action you should take given your epistemic state. FDT tells you “if you’re in this game and you see $1, you should leave it”, no precommitment or self-modification required. You either comply with the FDT recommendation at any given time, or not.
There is no need to mess about with “well, an EDT agent in this epistemic situation should take the $1, but if they self-modified then they aren’t capable of following the EDT recommendation anymore which is good because they on average end up better off”, or any of that mess with commitment races, or whatever.
To meaningfully compare decision algorithms, we first need some precise way of scoring them. Given a function which takes as input an environment and a decision algorithm and outputs a utility suitable for comparison, we can then easily define the optimal decision algorithm: it is just the one that argmaxes our utility ranking function, whatever our utility function is.
You are implicitly using something like expected utility as the utility function when you say “Then over all universes, 99% of FDT agents get $100 and 1% get nothing for an average performance of $99.”.
We can not compare decision algorithms that do not operate on the same types. The only valid comparison is evaluations on exactly bit identical environment situations, and bit identical algorithm output options.
So there are 3 wildly different environments in your example:
You observe $1 in a transparent box, actions are {take, leave}
You observe $100 in a transparent box, actions are {take, leave}
You are about to observe one of [$1, $100] in a transparent box, and your action set includes a wide variety of self-modifications (or equivalently, precomittments).
FDT doesn’t outperform EDT on any of these 3 specific subproblems. Any argument that FDT is superior based on comparing completely different decision problem setups is just a waste of breath.
In actual practice any implementation of FDT has to use some form of self-modification (write new controller code for some specific situation) or some form of binding precomittment (which also absolutely can work, humans have been using for ages), and EDT could also use those options.
Without self-modifications or binding precomittments you are leaving money on the table in one of these scenarios. If your response is “FDT doesn’t need precomittments or self-modification, it just always figures out when to cooperate even with past selves”, then it leaves money on the table in scenario 1. EDT is optimal in each of these 3.
FDT outperforms EDT on
4. You are about to observe one of [$1, $100] in a transparent box, but your action set doesn’t include any self-modifications or precomittments.
5. You are about to observe one of [$1, $100] in a transparent box, but you don’t know about it and will know about the rules of this game only when you will already see the box.
Probably false—to show this you need to describe how to implement FDT on an actual computer without self modification (which I define broadly enough to specifically include any likely plausible implementations) or precomittments.
To the extent FDT wins there it only does so at the expense of losing in more likely scenarios with alternate rules or no rules at all. I already predicted this response and you are not responding to my predicted-in-advance counter: that FDT loses in scenario 1 for example (which is exactly the same as your scenario 5, but we start the scenario and thus measure performance only after the observation of [$1] in the transparent box, so any gains in alternate universes are ignored in our calculation of utility)
EDT-agent in (5) goes in (1) with 99% probability and in scenario (2) with 1% probability. It wins 99%*$1+1%*$100=$1.99 in expectation.
FDT-agent in (5) goes in (1) with 1% probability and in scenario (2) with 99% probability. It wins 1%*$0+99%*$100=$99 in expectation.
IMO, to say that FDT-agent loses in (1) and therefore it is inferior to EDT-agent is like say that it’s better to choose to roll a die with win on 6 then to roll a die with win on 1-5 because this option is better in the case where a die rolls 6.
In what exact set of alternate rules EDT-agent wins more in expectation?
Should be obvious from your 5 example. In 5 at the moment of decision (which really is a preaction) the agent doesn’t know about the scenario yet. There are an infinite set of such scenarios with many different rules—including the obvious vastly more likely set of environments where there is no predictor, the predictor is imperfect, the rules are reversed, “FDT agents lose”, etc.
FDT-agent obviously never decide “I will never ever take $1 from the box”. It decides “I will not take $1 in the box if the rules of the situation I’m in are like <rules of this game>”.
Only it’s more general, something like “When I realise that it would be better if I made some precommitment earlier, I act like I would act if I actually made it” (not sure that this phrasing is fully correct in all cases).
Which means it obviously loses in my earlier situation 1. It is optimal to make binding commitments earlier only because we are defining optimality based on measuring across both [$1,$100] universes. But in situation 1 we are measuring utility/optimality only in the [$1] universe—as that is now all that exists—and thus the optimal action (which optimal EDT takes) is to take the $1.
In 1 it is obviously suboptimal to retroactively bind yourself to a hypothetical precomittment you didn’t actually make.
Well, yes, it loses in (1), but it’s fine, because it wins in (4) and (5) and is on par with EDT-agent in (3). (1) is not the full situation in this game, it’s always a consequence of (3), (4) or (5), depending on interpretation, the rules don’t make sense otherwise.
PS. If FDT-agent is suddenly teleported into situation (1) in place of some other agent by some powerful entity who can deceive the predictor and the predictor predicted the behaviour of this other agent who was in the game before, and FDT-agent knows all this, it obviously takes $1, why not?
As for 4 - even just remembering anything is a self modification of memory.
From your problem description earlier you said:
So some agents do find themselves in 1.), and it’s obviously optimal to take the $1 if you can. FDT is in some sense giving up utility here by using a form of retroactive precomittment, hopefully in exchange for utility on other branches. The earlier decision to precommit (whether actually made or later simulated/hallucinated) sacrifices utility of some future selves in exchange for greater utility to other future selves.
So the sequence of events from the agent’s perspective is
A. observe one of [$1,$100] in transparent box (without any context or rules)
B. receive the info about Omega’s predictions
C. decide to take or leave
At the moment A and later the agent has already observed $1 or $100. In universes where they observe $1 at A, then optimal decision at C is to take. In universes where they observe $100 at A, the optimal decision at C is to take.
The FDT move is obviously optimal for 5 only if we measure utility at a point in time before A, when the agent doesn’t know anything about this environment yet (and so could plausibly be in any of an infinite set of alternatives) and we measure only over the subset of universes conditioned on our secret knowledge of the problem setup.
In principle it seems wrong to measure utility at the moment in time right before A on the basis of our knowledge; seems we should only measure it based on the agent’s knowledge. This means we need to sum our expectation over all possibly universes consistent with those facts. The set of universes that proceed to B/C is infinitesimal and probably counter balanced by opposites—so the very claim itself that FDT is optimal for 5 is perhaps a form of pascal’s mugging.
We can also construct more specific variants of 5 where FDT loses—such as environments where the message at step B is from an anti-Omega which punishes FDT like agents.
FDT uses a sort of universal precommitment: from my understanding it’s something like always honor precommitments your past self would have made (if your past self had your current knowledge). Really evaluating whether adopting that universal precommitment pays off seems rather complex. But naturally a powerful EDT agent will simply adopt that universal precommitment if when it believes it is in a universe distribution where doing so is optimal! But that does not imply adopting that precommitment is always everywhere optimal.
Sudden thought half a year later:
But what if we restrict reasoning to non-embedded agents? So Omegas of all kind have access to a perfect Oracle who can predict what you will do, but can’t actually read yout thoughts and know that you will do it because you use FDT. I doubt that it is possible in this case to construct a similar anti-FDT situation.
That’s for humans, not abstract agents? Don’t think it matters, we talk about other self-modifications anyway.
Not mine :)
Maybe this interpretation is what repels you? Here’s another 2:
You choose to behave like EDT-agent or like FDT-agent in the situations where it matters in advance, before you got into (1) or (3). And you can’t legibly for the predictors like one in this game decide to behave like FDT agent, and then, in the future, when you got into (1) because you’re unlucky, just change your mind. It’s just not an option. And between options “legibly choose to behave like EDT-agent” and “legibly choose to behave like FDT-agent” the second one is clearly better in expectation. You just not make another choice in (1) or (2), it’s already decided.
If you find yourself in (1) or (2) you can’t differentiate between cases “I am real me” and “I am the model of myself inside predictor” (because if you could, you could behave differently in this two cases and it would be bad model and bad predictor). So you decide for both at once. (this interpretation doesn’t work well for afents with explicitly self-indicated values (or how it is called? I hope it’s clear what I mean.))
Yes. It’s like choose to win on a 1-5 on a die roll rather then win on a 6. You sacrifice utility if some future selves (in the worlds, when die roll 6) in exchange for greater utility to other future selves, and it’s perfectly rational.
Ok, yes. You can do it with all other types of agents too.
I think the ability to legibly adopt such precommitment and willingness to do so kinda turns EDT-agent into FDT-agent.
Yes. I think we are mostly in agreement then. FDT seems to be defined by adopting a form of universal precomitment, which you can only do once and can’t really undo. Seems that EDT can clearly do that (to the extent any agent can adopt FDT), so EDT can always EDT->FDT, but FDT->EDT is not allowed (or it breaks the universal pre-commitment or cooperation across instances) . That does not resolve the question of whether or not adopting FDT is optimal.
My main point from earlier is this:
The agent in scenario 5 before observing the box and the rules is a superposition of all agents in similar scenarios, and it is only correct for us to judge their performance across that entire set—ie according to the agent’s knowledge, not our knowledge. So it’s optimal to take the FDT precomittment in this specific scenario only if it’s optimal to do so over all similar environments, which in this case is nearly all environments as the agent hasn’t observed anything at all at the start of your scenario 5!
So I think this reduces down to the conclusion that FDT and its universal precomittment can’t provide any specific advantage on a specific problem over regular problem-specific precomittments EDT can make, unless it provides a net advantage everywhere across the multiverse, in which case EDT uses that and becomes FDT.
This is a debate about whether these strategies are even relevant and seems to me to have ignored the paper I was sharing. But in any case,
https://arxiv.org/abs/2307.10987
If the best the EDT-agent can do is precommit to behave like FDT-agent or self-modify itself into FDT-agent, it’s weird to say that EDT is better :)
that’s
value(action) = sum_j( prob(outcome_j GIVEN action) * D(outcome_j) )
, what is D? it is not at all obvious to me that there’s a straightforward way to parameterize D for learning that is self-consistent in moral dilemmas.That should be U—it is the utility function which computes the utility of a future universe.
EDT chokes on Simpson’s Paradox—specifically, the “Kidney Stone Treatment” example. EDT will only look at the combined data and ignore the confounding variable (the size of the kidney stones), and end up choosing the worse treatment. Which treatment you get doesn’t change whether your kidney stone is large or small, but EDT will make decisions as though it does.
I disagree. A sufficiently powerful EDT reasoner—well before the limit of AIXI—will have no problem choosing the correct action, because it is absolutely not limited to making some decision based purely on the data in that table. So no it will not “only look at the combined data and ignore ..”, as its world model will predict everything correctly. You can construct a naive EDT that is a dumb as a rock, but that is a fault only of that model, not a fault of EDT as the simple correct decision rule.
Parfit’s Hitchhiker? Smoking Lesion?
I always thought AIXI uses CDT because the actions are inputs to the Turing machines rather than outputs, so it’s not looking at short Turing machines that output the action under consideration, the action is a given.
Care to explain why that’s EDT? A link to an existing explanation would be fine.
The actions are inferred from the argmax, but they are also inputs to the prediction models. Thus AIXI is not constrained to avoid updating on its own actions, which allows it to entertain the correct world models for one boxing, for example. If it’s world models have learned that Omega never lies and is always correct, those same world models will learn the predictive shortcut that the box content is completely predictable from the action output channel, and thus it will correctly estimate that the one-box branch has higher payout.
The actions sui generis being “inputs to the prediction models” does not distinguish CDT from EDT.
(To be continued, leaving now.)
My understanding is that CDT explicitly disallows acausal predictions—so it disallows models which update on future agent actions themselves, which is important for one boxing.
In EDT/AIXI the world model is allowed to update the hidden box state conditional on the action chosen, even though this is ‘acausal’. Its equivalent to simply correctly observing that the agent will get higher reward in the subset of the multiverse where the agent decides to one boxe.
In real world, most of variables correlates with each other. If you take action that correlates most with high utility, you are going to throw away a lot of resources.
Do you have a concrete example?
Like this?
If you are suggesting that as a counterexample—that a powerful bayesian model based learning agent (ie EDT) would incorrectly believe that stork populations cause human births (or more generally would confuse causation and correlation), then no I do not agree.
A reasonably powerful world model would correctly predict that changes to stork populations are not directly very predictive of human births.
“A reasonably powerful world model” would correctly predict that being FDT agent is better than being EDT and modify itself into FDT agent, because there are several problems where both CDT and EDT fail in comparison with FDT (see original FDT paper).
False—for example the parfit’s setup doesn’t compare EDT and FDT on exact bit-equivalent environments and action choices—see my reply here.
For the environment where you are stranded in the desert talking with the driver, the optimal implicit action is to agree to pay them, and precommit to this (something humans do without too much trouble all the time). EDT obviously can make that optimal decision given the same decision options that FDT has.
For the environment where you are in the city having already received the ride, and you didn’t already precommit (agree to pay in advance), EDT also makes the optimal action of not paying.
FDT’s supposed superiority is a misdirection based on allowing it new preactions before the main action.