I would love to debate you on this. My view: there is no single known problem in which FDT makes an incorrect decision. I have thought about FDT a lot and it seems quite obviously correct to me.
You should take a look this list of UDT open problems that Vladimir Slepnev wrote 13 years ago, where 2 and 3 are problems in which UDT/FDT seemingly make incorrect decisions, and 1 and 5 are definitely also serious open problems.
Here’s what infra-Bayesianism has to say about these problems:
Understanding multi-agent scenarios is still an open problem in general, but a recent line of inquiry (hopefully I will write about it in a month or so) leads me to believe that, at least in repeated games, IB agents converge to outcomes that are (i) nearly Pareto efficient (ii) above the maximin of each player (and arguably this might generalize to one-shot games as well, under reasonable assumptions). However, which Pareto efficient outcome results depends on the priors (and maybe also on other details of the learning algorithm) those agents have. In the three player prisoner’s dilemma you described, CCD is a an outcome satsifying i+ii (as opposed to CD in a two player PD, which fails condition ii). Therefore, it is an outcome that can arise with three IB players, and being a “CDT” (or any kind of defector) is not strictly superior. Moreover, it seems natural that the precise outcome depends on priors: depending on what kind of agents you expect to meet, it might be better to drive a “harder” or “softer” bargain.
This depends on how you operationalize ASP. One operationalization is: consider a repeated Newcomb’s problem, where the Predictor receives your state on every round and does a weak analysis of it, filling the box only if the weak analysis is sufficient to confirm you will one-box. In this case, a metacognitive IB agent with a sufficiently weak core will learn to win by not querying the envelope in order to make itself more predictable. (This idea appears in some comment I made a while ago, which I can’t locate atm.) Now, this still requires that the core is sufficiently weak. However, I think this type of requirement is necessary, since there has to be some cutoff for what counts as a predictor.
This again comes down to selecting among Pareto efficient outcomes, so the comments in #1 apply.
This is not a utility function compatible with IB, nor do I see why such utility functions must be admissible. For a metacognitive agent, we can write a utility function that rewards finding an inconsistency after time $t$ by $f(t)$ where $f$ is some function that goes to 0 as its argument goes to infinity.
Why the focus on repeated games? It seems like one of the central motivations for people to be interested in logical decision theories (TDT/UDT/FDT) is that they recommend one-boxing and playing C in PD (against someone with similar decision theory), even in non-repeated games, and you’re not addressing that?
First, one problem with classical thinking about decision theory is that it assumes perfect separation between learning models and deciding based on those models. Moreover, different decision theories use different type signatures for what a “model” is: in EDT it’s a probability distribution, in CDT it’s a causal network (interpreted as physical causality), in FDT it’s a “logical” causal network. This without giving any consideration to whether learning models of this type is even possible, or whether the inherent ambiguity makes some questions in decision theory moot.
In order to consider learning as part of the process, we need to set up the problem in a way that allows gathering evidence. The simplest way to do it is iterating the original problem. This quickly leads to insights that show that the perfect separation assumption is deeply flawed. For example, consider the iterated Newcomb’s problem. On every round, the agent selects among two actions (one-box and two-box), makes one of two possible observations (empty-box or full-box) and gets a reward accordingly. Remarkably, virtually any reasonable reinforcement learning (RL) algorithm converges to one-boxing[1].
This happens despite the fact that RL algorithms can be naively thought of as CDT: there’s an infinite repetitive causal network, with causation flowing from action to unobservable state, from unobservable state to observation, and from previous unobservable state to next unobservable state. The reason for the apparent paradox is that inferring causation from statistical correlations leads to labeling some kinds of logical causation as causation as well. This already shows that the EDT/CDT-style taxonomy gets murky and inadequate when describing learning agents.
However, some Newcombian scenarios don’t yield that easily. For example, in counterfactual mugging RL fails. It also fails in iterated XOR blackmail[2].
Enter infra-Bayesianism. The motivation is solving realizability. As a “side effect”, (iterated/learnable) Newcombian scenarios get solved as well[3], at least as long as they obey a condition called “pseudocausality”[4]. Here by “solved” I mean, convergence to the FDT-payoff is ensured. This side effect is not accidental, because the existence of powerful predictors in the environment is indeed a nonrealizable setting.
Second, another line of thinking which leads to iterated games is Abram’s thesis that in logical time, all games are iterated games. I now consider this insight even more powerful than I realized at the time.
A key conjecture informing my thinking about intelligence is: the core of intelligence lies in learning algorithms. In particular, the ability to solve problems by deductive reasoning is also the product of a learning algorithm that converges to deductive reasoning (augmented by some way of selecting promising deductive steps) as an effective method for solving various problems. In particular this frees us from committing to a particular system of axioms: learning algorithms can empirically decide which axiom systems are more useful than others.
However, classical learning theory flounders against the fact that real life priors are unlearnable, because of traps: actions can lead to irreversible long-term harm. But, we can rescue the power of learning algorithms via the metacognitive agents framework. The latter allows us to split learning into a “logical” part (imagine doing thought experiments in your head) and a “physical” part, where the latter navigates traps in a particular way that the former has learned. In machine learning parlance, the first part is “metalearning” and the second part is “object-level learning”. (See my recent talk for some more details.)
This formalism for metacognition is also a realization of Abram’s idea. The role of logical time is played by the amount of computing resources available for metalearning. Indeed, my recent line of inquiry about repeated games (working name: “infra-Bayesian haggling”) is applicable in this framework, enabling the same conclusions for one-shot games. However, there is still the major challenge of solving games where we have one-shot transparent-source-code interactions with individual agents, but we also need to learn the semantics of source code empirically by interacting with many different agents over time (the latter is just the simplest way of operationalizing the gathering of empirical evidence about semantics). For this I only have vague ideas, even though I believe a solution exists.
Third, repeated games are an easier but still highly non-trivial problem, so they are a natural starting point for investigation. One of the reasons for the LessWrongian focus on one-shot games was (I think) that in the iterated Prisoner’s Dilemma (IPD), cooperation is already “solved” by tit-for-tat. However, the latter is not really true: while tit-for-tat is a Nash equilibrum in the γ→1 limit[5], defect-defect is also a Nash equilibrium. Worse, if instead of geometric time discount we use step-function time discount, defect-defect is again the only Nash equilibrium (although tit-for-tat is at least an ϵ-Nash equilibrium, with ϵ vanishing in the long horizon limit). Also worse, we don’t have a learning-theoretic explanation of convergence even to Nash equilibria, because of the grain-of-truth problem!
The grain-of-truth problem is just a special case of the nonrealizability problem. So it’s natural to expect that infra-Bayesianism (IB) should help. Indeed, IB more or less solves convergence to Nash equilibria in repeated two-player games off the bat, but in a rather disappointing manner: due to the folk theorem, any payoff above the maximin is a Nash payoff, and IB indeed guarantees the maximin payoff. This doesn’t requiring anything resembling learning a theory of mind, just learning the rules of the game. Getting any stronger results proved difficult, until very recently.
Enter infra-Bayesian haggling. Now[6], IB agents converge[7] to Pareto efficient outcomes.
A major caveat is, infra-Bayesian haggling still look like it isn’t really learning a theory of mind, just sort of trial-and-erroring until something sticks. I suspect that a solution to the harder problem I mentioned before (learning the semantics of source code in one-shot transparent interactions with many agents) would look more theory-of-mind-esque.
So, there is still much work ahead but I feel that drawing some insight about your open problems is already somewhat justified.
It’s also interesting to consider the impact on human intuition. I suspect it’s much harder to feel that two-boxing makes sense when the experiment repeats every day, and every time you one-box you get out with much more money.
Perhaps we can rescue the classical taxonomy by declaring that RL is just EDT. But then the question is what kind of learning agent is CDT, if any. The classical Pearlian methods for causal inference don’t seem to help, unless you assume the ability to magically intervene on arbitrary variables, which is completely unjustified. As we’ve seen, the agent’s selection of its own actions cannot be interpreted as intervention, if we want to rescue CDT while preserving its insistence that causality is physical.
Also interesting is that the representation of Newcombian scenarios as an ultra-POMDP, allowing for homogeneous ultradistributions (this obviates the need for the so-called “Nirvana trick”), is a natural reflection of the formulation of the problem in terms of predictors. In particular, the original Newcomb’s problem can be represented in this manner, which in some sense preserves the separation between “logical” and “physical” causality (although this alternative representation is observationally indistringuishable: but maybe it’s a hint that IB agents would have an easier time learning it, given additional relevant cues).
Which rules out e.g. full-box-dependent transparent Newcomb, while allowing empty-box-dependent transparent Newcomb or full-box-dependent transparent Newcomb with noise. Pseudocausality is (borrowing the language of the original Timeless Decision Theory paper) a “fairness condition”, weaker than that required by e.g. CDT but a little stronger that FDT is supposed to have. There are variants of infra-Bayesianism with a weaker fairness condition, including those specifically constructed for this purpose (noisy infra-Bayesianism and infra-Bayesianism with infinitesimals) which might be contrived, but also infra-Bayesian physicalism (originally motivated by naturalized induction and related issues). However, we haven’t yet described the learning theory of the latter.
I claim. I haven’t actually written out the full proof rigorously, and it will probably take me a while to get around to that. The deterministic/sharp-hypotheses case seems fairly straightforward but the stochastic-hypotheses case might be quite hairy.
In some limit. Under some non-trivial and yet-to-be-fully-understood but arguably-plausible assumptions about the learning algorithm, and some reasonable assumptions about the prior.
Ah, I just read your substack post on this, and you’ve referenced two pieces I’ve already reacted to (and in my view debunked) before. Seems like we could have a good debate on this :)
I would love to debate you on this. My view: there is no single known problem in which FDT makes an incorrect decision. I have thought about FDT a lot and it seems quite obviously correct to me.
You should take a look this list of UDT open problems that Vladimir Slepnev wrote 13 years ago, where 2 and 3 are problems in which UDT/FDT seemingly make incorrect decisions, and 1 and 5 are definitely also serious open problems.
Here’s what infra-Bayesianism has to say about these problems:
Understanding multi-agent scenarios is still an open problem in general, but a recent line of inquiry (hopefully I will write about it in a month or so) leads me to believe that, at least in repeated games, IB agents converge to outcomes that are (i) nearly Pareto efficient (ii) above the maximin of each player (and arguably this might generalize to one-shot games as well, under reasonable assumptions). However, which Pareto efficient outcome results depends on the priors (and maybe also on other details of the learning algorithm) those agents have. In the three player prisoner’s dilemma you described, CCD is a an outcome satsifying i+ii (as opposed to CD in a two player PD, which fails condition ii). Therefore, it is an outcome that can arise with three IB players, and being a “CDT” (or any kind of defector) is not strictly superior. Moreover, it seems natural that the precise outcome depends on priors: depending on what kind of agents you expect to meet, it might be better to drive a “harder” or “softer” bargain.
This depends on how you operationalize ASP. One operationalization is: consider a repeated Newcomb’s problem, where the Predictor receives your state on every round and does a weak analysis of it, filling the box only if the weak analysis is sufficient to confirm you will one-box. In this case, a metacognitive IB agent with a sufficiently weak core will learn to win by not querying the envelope in order to make itself more predictable. (This idea appears in some comment I made a while ago, which I can’t locate atm.) Now, this still requires that the core is sufficiently weak. However, I think this type of requirement is necessary, since there has to be some cutoff for what counts as a predictor.
This again comes down to selecting among Pareto efficient outcomes, so the comments in #1 apply.
This is not a utility function compatible with IB, nor do I see why such utility functions must be admissible. For a metacognitive agent, we can write a utility function that rewards finding an inconsistency after time $t$ by $f(t)$ where $f$ is some function that goes to 0 as its argument goes to infinity.
See comments in #1.
Why the focus on repeated games? It seems like one of the central motivations for people to be interested in logical decision theories (TDT/UDT/FDT) is that they recommend one-boxing and playing C in PD (against someone with similar decision theory), even in non-repeated games, and you’re not addressing that?
First, one problem with classical thinking about decision theory is that it assumes perfect separation between learning models and deciding based on those models. Moreover, different decision theories use different type signatures for what a “model” is: in EDT it’s a probability distribution, in CDT it’s a causal network (interpreted as physical causality), in FDT it’s a “logical” causal network. This without giving any consideration to whether learning models of this type is even possible, or whether the inherent ambiguity makes some questions in decision theory moot.
In order to consider learning as part of the process, we need to set up the problem in a way that allows gathering evidence. The simplest way to do it is iterating the original problem. This quickly leads to insights that show that the perfect separation assumption is deeply flawed. For example, consider the iterated Newcomb’s problem. On every round, the agent selects among two actions (one-box and two-box), makes one of two possible observations (empty-box or full-box) and gets a reward accordingly. Remarkably, virtually any reasonable reinforcement learning (RL) algorithm converges to one-boxing[1].
This happens despite the fact that RL algorithms can be naively thought of as CDT: there’s an infinite repetitive causal network, with causation flowing from action to unobservable state, from unobservable state to observation, and from previous unobservable state to next unobservable state. The reason for the apparent paradox is that inferring causation from statistical correlations leads to labeling some kinds of logical causation as causation as well. This already shows that the EDT/CDT-style taxonomy gets murky and inadequate when describing learning agents.
However, some Newcombian scenarios don’t yield that easily. For example, in counterfactual mugging RL fails. It also fails in iterated XOR blackmail[2].
Enter infra-Bayesianism. The motivation is solving realizability. As a “side effect”, (iterated/learnable) Newcombian scenarios get solved as well[3], at least as long as they obey a condition called “pseudocausality”[4]. Here by “solved” I mean, convergence to the FDT-payoff is ensured. This side effect is not accidental, because the existence of powerful predictors in the environment is indeed a nonrealizable setting.
Second, another line of thinking which leads to iterated games is Abram’s thesis that in logical time, all games are iterated games. I now consider this insight even more powerful than I realized at the time.
A key conjecture informing my thinking about intelligence is: the core of intelligence lies in learning algorithms. In particular, the ability to solve problems by deductive reasoning is also the product of a learning algorithm that converges to deductive reasoning (augmented by some way of selecting promising deductive steps) as an effective method for solving various problems. In particular this frees us from committing to a particular system of axioms: learning algorithms can empirically decide which axiom systems are more useful than others.
However, classical learning theory flounders against the fact that real life priors are unlearnable, because of traps: actions can lead to irreversible long-term harm. But, we can rescue the power of learning algorithms via the metacognitive agents framework. The latter allows us to split learning into a “logical” part (imagine doing thought experiments in your head) and a “physical” part, where the latter navigates traps in a particular way that the former has learned. In machine learning parlance, the first part is “metalearning” and the second part is “object-level learning”. (See my recent talk for some more details.)
This formalism for metacognition is also a realization of Abram’s idea. The role of logical time is played by the amount of computing resources available for metalearning. Indeed, my recent line of inquiry about repeated games (working name: “infra-Bayesian haggling”) is applicable in this framework, enabling the same conclusions for one-shot games. However, there is still the major challenge of solving games where we have one-shot transparent-source-code interactions with individual agents, but we also need to learn the semantics of source code empirically by interacting with many different agents over time (the latter is just the simplest way of operationalizing the gathering of empirical evidence about semantics). For this I only have vague ideas, even though I believe a solution exists.
Third, repeated games are an easier but still highly non-trivial problem, so they are a natural starting point for investigation. One of the reasons for the LessWrongian focus on one-shot games was (I think) that in the iterated Prisoner’s Dilemma (IPD), cooperation is already “solved” by tit-for-tat. However, the latter is not really true: while tit-for-tat is a Nash equilibrum in the γ→1 limit[5], defect-defect is also a Nash equilibrium. Worse, if instead of geometric time discount we use step-function time discount, defect-defect is again the only Nash equilibrium (although tit-for-tat is at least an ϵ-Nash equilibrium, with ϵ vanishing in the long horizon limit). Also worse, we don’t have a learning-theoretic explanation of convergence even to Nash equilibria, because of the grain-of-truth problem!
The grain-of-truth problem is just a special case of the nonrealizability problem. So it’s natural to expect that infra-Bayesianism (IB) should help. Indeed, IB more or less solves convergence to Nash equilibria in repeated two-player games off the bat, but in a rather disappointing manner: due to the folk theorem, any payoff above the maximin is a Nash payoff, and IB indeed guarantees the maximin payoff. This doesn’t requiring anything resembling learning a theory of mind, just learning the rules of the game. Getting any stronger results proved difficult, until very recently.
Enter infra-Bayesian haggling. Now[6], IB agents converge[7] to Pareto efficient outcomes.
A major caveat is, infra-Bayesian haggling still look like it isn’t really learning a theory of mind, just sort of trial-and-erroring until something sticks. I suspect that a solution to the harder problem I mentioned before (learning the semantics of source code in one-shot transparent interactions with many agents) would look more theory-of-mind-esque.
So, there is still much work ahead but I feel that drawing some insight about your open problems is already somewhat justified.
It’s also interesting to consider the impact on human intuition. I suspect it’s much harder to feel that two-boxing makes sense when the experiment repeats every day, and every time you one-box you get out with much more money.
Perhaps we can rescue the classical taxonomy by declaring that RL is just EDT. But then the question is what kind of learning agent is CDT, if any. The classical Pearlian methods for causal inference don’t seem to help, unless you assume the ability to magically intervene on arbitrary variables, which is completely unjustified. As we’ve seen, the agent’s selection of its own actions cannot be interpreted as intervention, if we want to rescue CDT while preserving its insistence that causality is physical.
Also interesting is that the representation of Newcombian scenarios as an ultra-POMDP, allowing for homogeneous ultradistributions (this obviates the need for the so-called “Nirvana trick”), is a natural reflection of the formulation of the problem in terms of predictors. In particular, the original Newcomb’s problem can be represented in this manner, which in some sense preserves the separation between “logical” and “physical” causality (although this alternative representation is observationally indistringuishable: but maybe it’s a hint that IB agents would have an easier time learning it, given additional relevant cues).
Which rules out e.g. full-box-dependent transparent Newcomb, while allowing empty-box-dependent transparent Newcomb or full-box-dependent transparent Newcomb with noise. Pseudocausality is (borrowing the language of the original Timeless Decision Theory paper) a “fairness condition”, weaker than that required by e.g. CDT but a little stronger that FDT is supposed to have. There are variants of infra-Bayesianism with a weaker fairness condition, including those specifically constructed for this purpose (noisy infra-Bayesianism and infra-Bayesianism with infinitesimals) which might be contrived, but also infra-Bayesian physicalism (originally motivated by naturalized induction and related issues). However, we haven’t yet described the learning theory of the latter.
Notably, this is the same limit that we consider in learning theory.
I claim. I haven’t actually written out the full proof rigorously, and it will probably take me a while to get around to that. The deterministic/sharp-hypotheses case seems fairly straightforward but the stochastic-hypotheses case might be quite hairy.
In some limit. Under some non-trivial and yet-to-be-fully-understood but arguably-plausible assumptions about the learning algorithm, and some reasonable assumptions about the prior.
Thanks for the link!
And hmm, it seems to me FDT one-boxes on ASP, as that gives the most utility from the subjunctive dependence perspective.
Oh wait, of course, in this problem, Omega doesn’t simulate the agent. Interesting! I’ll have to think about this more :-)
Ah, I just read your substack post on this, and you’ve referenced two pieces I’ve already reacted to (and in my view debunked) before. Seems like we could have a good debate on this :)
How’d you feel about a verbal debate?
I have written before why FDT is relevant to solving the alignment problem. I’d be happy to discuss that to you.