Upvoted; this is a good summary of the issue, and using the new label TDT is arguably more elegant than having to talk separately about the rationality of cultivating a disposition.
How significant are the open questions? We should not expect correct theory to work in the face of arbitrary acts of Omega. Suppose Omega says “Tomorrow I will examine your source code, and if you don’t subscribe to TDT I will give you $1 million, and if you do subscribe to TDT I will make you watch the Alien movie series—from the third one on”. In this scenario it would be rational to self modify to something other than TDT; a similar counter can be constructed for any theory whatsoever.
“I just flipped a fair coin. I decided, before I flipped the coin, that if it came up heads, I would ask you for $1000. And if it came up tails, I would give you $1,000,000 if and only if I predicted that you would give me $1000 if the coin had come up heads. The coin came up heads—can I have $1000?”
Does this correspond to a significant class of problem in the real world, in the same way Parfit’s Hitchhiker does?
In this scenario it would be rational to self modify to something other than TDT; a similar counter can be constructed for any theory whatsoever.
Right, so the decision theories I try to construct are for classes of problems where I can identify a winning property of how the algorithm decides things or strategizes things or responds to things or whatever, a property which determines the payoff fully and screens off all other dependence on the algorithm. Then the algorithm can maximize that property of itself.
Causal decision theory then corresponds to the problem class where your physical action fully determines the result, and anything else, like logical dependence on your algorithm’s disposition, is not allowed. CDT agents will successfully maximize on that problem class.
Okay, so what problem class are you aiming for with TDT? It can’t be the full class of problems where the result depends on your disposition, because there will always be a counter. Do you have a slightly more restricted class in mind?
The TDT I actually worked out is for the class where your payoffs are fully determined by the actual output of your algorithm, but not by other outputs that your algorithm would have made under other conditions. As I described in the “open problems” post, once you allow this sort of strategy-based dependence, then I can depend on your dependence on my dependence on … and I don’t yet know how to stop the recursion. This is closely related to what Wei Dai and I are talking about in terms of the “logical order” of decisions.
If you want to use the current TDT for the Prisoner’s Dilemma, you have to start by proving (or probabilistically expecting) that your opponent’s decision is isomorphic to your own. Not by directly simulating the opponent’s attempt to determine if you cooperate only if they cooperate. Because, as written, the counterfactual surgery that stops the recursion is just over “What if I cooperate?” not “What if I cooperate only if they cooperate?” (Look at the diagonal sentence.)
Omega comes along and says “I ran a simulation to see if you would one-box in Newcomb. The answer was yes, so I am now going to feed you to the Ravenous Bugblatter Beast of Traal. Have a nice day.”
Doesn’t this problem fit within your criteria?
If you reject it on the basis of “if you had told me the relevant facts up front, I would’ve made the right decision”, can’t you likewise reject the one where Omega flips a coin before telling you about the proposed bet?
If you have reason in advance to believe that either is likely to occur, you can make an advance decision about what to do.
Does either problem have some particular quality relevant for its classification here, that the other does not?
Omega comes along and says “I ran a simulation to see if you would one-box in Newcomb. The answer was yes, so I am now going to feed you to the Ravenous Bugblatter Beast of Traal. Have a nice day.”
That’s more like a Counterfactual Mugging, which is the domain of Nesov-Dai updateless decision theory—you’re being rewarded or punished based on a decision you would have made in a different state of knowledge, which is not “you” as I’m defining this problem class. (Which again may sound quite restrictive at this point, but if you look at 95% of the published Newcomblike problems...)
What you need here is for the version of you that Omega simulates facing Newcomb’s Box, to know about the fact that another Omega is going to reward another version of itself (that it cares about) based on its current logical output. If the simulated decision system doesn’t know/believe this, then you really are screwed, but it’s more because now Omega really is an unfair bastard (i.e. doing something outside the problem class) because you’re being punished based on the output of a decision system that didn’t know about the dependency of that event on its output—sort of like Omega, entirely unbeknownst to you, watching you from a rooftop and sniping you if you eat a delicious sandwich.
If the version of you facing Newcomb’s Problem has a prior over Omega doing things like this, even if the other you’s observed reality seems incompatible with that possible world, then this is the sort of thing handled by updateless decision theory.
Right. But then if that is the (reasonable) criterion under which TDT operates, it seems to me that it does indeed handle the case of Omega’s after the fact coin flip bet, in the same way that it handles (some versions of) Newcomb’s problem. How do you figure that it doesn’t?
Because the decision diagonal I wrote out, handles the probable consequences of “this computation” doing something, given its current state of knowledge—its current, updated P—so if it already knows the coinflip (especially a logical coinflip like a binary digit of pi) came up heads, and this coinflip has nothing counterfactually to do with its decision, then it won’t care about what Omega would have done if the coin had come up tails and the currently executing decision diagonal says “don’t pay”.
Ah! so you’re defining “this” as exact bitwise match, I see. Certainly that helps make the conclusions more rigorous. I will suggest the way to handle the after-the-fact coin flip bet is to make the natural extension to sufficiently similar computations.
Note that even selfish agents must do this in order to care about themselves five minutes in the future.
To further motivate the extension, consider the variant of Newcomb where just before making your choice, you are given a piece of paper with a large number written on it; the number has been chosen to be prime or composite depending on whether the money is in the opaque box.
Ah! so you’re defining “this” as exact bitwise match
That’s not the problem. The problem is that you’ve already updated your probability distribution, so you just don’t care about the cases where the binary digit came up 0 instead of 1 - not because your utility function isn’t over them, but because they have negligible probability.
the number has been chosen to be prime or composite depending on whether the money is in the opaque box
(First read that variant in Martin Gardner.) The epistemically intuitive answer is “Once I choose to take one box, I will be able to infer that this number has always been prime”. If I wanted to walk through TDT doing this, I’d draw a causal graph with Omega’s choice descending from my decision diagonal, and sending a prior-message in turn to the parameters of a child node that runs a primality test over numbers and picked this number because it passed (failed), so that—knowing / having decided your logical choice—seeing this number becomes evidence that its primality test came up positive.
In terms of logical control, you don’t control whether the primality test comes up positive on this fixed number, but you do control whether this number got onto the box-label by passing a primality test or a compositeness test.
(I don’t remember where I first read that variant, but Martin Gardner sounds likely.) Yes, I agree with your analysis of it—but that doesn’t contradict the assertion that you can solve these problems by extending your utility function across parallel versions of you who received slightly different sensory data. I will conjecture that this turns out to be the only elegant solution.
Sorry, that doesn’t make any sense. It’s a probability distribution that’s the issue, not a utility function. UDT tosses out the probability distribution entirely. TDT still uses it and therefore fails on Counterfactual Mugging.
It’s precisely the assertion that all such problems have to be solved at the probability distribution level that I’m disputing. I’ll go so far as to make a testable prediction: it will be eventually acknowledged that the notion of a purely selfish agent is a good approximation that nonetheless cannot handle such extreme cases. If you can come up with a theory that handles them all without touching the utility function, I will be interested in seeing it!
I will suggest the way to handle the after-the-fact coin flip bet is to make the natural extension to sufficiently similar computations.
It might be nontrivial to do this in a way that doesn’t automatically lead to wireheading (using all available power to simulate many extremely fulfilled versions of itself). Or is that problem even more endemic than this?
“if you had told me the relevant facts up front, I would’ve made the right decision”
This is a statement about my global strategy, the strategy I consider winning. In this strategy, I one-box in the states of knowledge where I don’t know about the monster, and two-box where I know. If Omega told me about the monster, I’d transition to a state of knowledge where I know about it, and, according to the fixed strategy above, I two-box.
In counterfactual mugging, for each instance of mugging, I give away $100 on the mugging side, and receive $10000 on the reward side. This is also a fixed global strategy that gives the actions depending on agent’s state of knowledge.
Thanks for the link! I’ll read the paper more thoroughly later, a quick skim suggests it is along the same lines. Are there any cases where DBDT and TDT give different answers?
I don’t think DBDT gives the right answer if the predictor’s snapshot of the local universe-state was taken before the agent was born (or before humans evolved, or whatever), because the “critical point”, as Fisher defines it, occurs too late. But a one-box chooser can still expect a better outcome.
It looks to me like DBDT is working in the direction of TDT but isn’t quite there yet. It looks similar to the sort of reasoning I was talking about earlier, where you try to define a problem class over payoff-determining properties of algorithms.
But this isn’t the same as a reflectively consistent decision theory, because you can only maximize on the problem class from outside the system—you presume an existing decision process or ability to maximize, and then maximize the dispositions using that existing decision theory. Why not insert yet another step? What if one were to talk about dispositions to choose particular disposition-choosing algorithms as being rational? In other words, maximizing “dispositions” from outside strikes me as close kin to “precommitment”—it doesn’t so much guarantee reflective consistency of viewpoints, as pick one particular viewpoint to have control.
As Drescher points out, if the base theory is a CDT, then there’s still a possibility that DBDT will end up two-boxing if Omega takes a snapshot of the (classical) universe a billion years ago before DBDT places the “critical point”. A base theory of TDT, of course, would one-box, but then you don’t need the edifice of DBDT on top because the edifice doesn’t add anything. So you could define “reflective consistency” in terms of “fixed point under precommitment or disposition-choosing steps”.
TDT is validated by the sort of reasoning that goes into DBDT, but the TDT algorithm itself is a plain-vanilla non-meta decision theory which chooses well on-the-fly without needing to step back and consider its dispositions, or precommit, etc. The Buck Stops Immediately. This is what I mean by “reflective consistency”. (Though I should emphasize that so far this only works on the simple cases that constitute 95% of all published Newcomblike problems, and in complex cases like Wei Dai and I are talking about, I don’t know any good fixed algorithm (let alone a single-step non-meta one).)
Exactly. Unless “cultivating a disposition” amounts to a (subsequent-choice-circumventing) precommitment, you still need a reason, when you make that subsequent choice, to act in accordance with the cultivated disposition. And there’s no good explanation for why that reason should care about whether or not you previously cultivated a disposition.
(Though I think the paper was trying to use dispositions to define “rationality” more than to implement an agent that would consistently carry out those dispositions?)
I didn’t really get the purpose of the paper’s analysis of “rationality talk”. Ultimately, as I understood the paper, it was making a prescriptive argument about how people (as actually implemented) should behave in the scenarios presented (i.e, the “rational” way for them to behave).
Unless “cultivating a disposition” amounts to a (subsequent-choice-circumventing) precommitment, you still need a reason, when you make that subsequent choice, to act in accordance with the cultivated disposition. And there’s no good explanation for why that reason should care about whether or not you previously cultivated a disposition.
That’s just what “dispositions” are in this context—tendencies to behave in particular ways under particular circumstances.
By this conception of what “disposition” means, you can’t cultivate a dispositon for keeping promises—and then break the promises when the chips are down. You are either disposed to keep promises, or you are not.
I had a look a the Wikipedia “Precommitment” article to see whether precommitment is actually as inappropriate as it seems to be being portrayed as.
According to the article, the main issue seems to involve cutting off your own options.
Is a sensible one-boxing agent “precommitting” to one-boxing by “cutting off its own options”—namely the option of two-boxing?
On one hand, they still have the option and a free choice when they come to decide. On the other hand, the choice has been made for them by their own nature—and so they don’t really have the option of choosing any more.
My assessment is that the word is not obviously totally inappropriate.
Does “disposition” have the same negative connotations as “precommitting” has? I would say not: “disposition” seems like a fairly appropriate word to me.
I don’t know if Justin Fisher’s work exactly replicates your own conclusions. However it seems to have much the same motivations, and to have reached many of the same conclusions.
FWIW, it took me about 15 minutes to find that paper in a literature search.
Violation of desire reflection would be a sufficient condition for violation of dynamic consistency, which in turn is a sufficient condition to violate reflective consistency. I don’t see a necessity link.
The most obvious reply to the point about dispositions to have dispositions is to take a behavourist stance: if a disposition results in particular actions under particular circumstances, then a disposition to have a disposition (plus the ability to self-modify) is just another type of disposition, really.
What the document says about the placing of the “critical point” is:
DBDT defines the critical point of a given scenario description as the most recent time prior to the choice in question which would have been a natural opportunity for the normal shaping of dispositions. I will say more about critical points in the next section. For now, let us take it for granted that, in short-duration scenarios like Newcomb’s problem and the psychologically-similar prisoners’ dilemma, the critical point comes prior to the first events mentioned in standard descriptions of these scenarios. (See Figure 1.)
Consequently, I am not sure where the idea that it could be positioned “too late” comes from. The document pretty clearly places it early on.
Newcomb’s problem? That’s figure 1. You are saying that you can’t easily have a disposition—before you even exist? Just so—unless your maker had a disposition to make you with a certain disposition, that is.
Upvoted; this is a good summary of the issue, and using the new label TDT is arguably more elegant than having to talk separately about the rationality of cultivating a disposition.
How significant are the open questions? We should not expect correct theory to work in the face of arbitrary acts of Omega. Suppose Omega says “Tomorrow I will examine your source code, and if you don’t subscribe to TDT I will give you $1 million, and if you do subscribe to TDT I will make you watch the Alien movie series—from the third one on”. In this scenario it would be rational to self modify to something other than TDT; a similar counter can be constructed for any theory whatsoever.
“I just flipped a fair coin. I decided, before I flipped the coin, that if it came up heads, I would ask you for $1000. And if it came up tails, I would give you $1,000,000 if and only if I predicted that you would give me $1000 if the coin had come up heads. The coin came up heads—can I have $1000?”
Does this correspond to a significant class of problem in the real world, in the same way Parfit’s Hitchhiker does?
Right, so the decision theories I try to construct are for classes of problems where I can identify a winning property of how the algorithm decides things or strategizes things or responds to things or whatever, a property which determines the payoff fully and screens off all other dependence on the algorithm. Then the algorithm can maximize that property of itself.
Causal decision theory then corresponds to the problem class where your physical action fully determines the result, and anything else, like logical dependence on your algorithm’s disposition, is not allowed. CDT agents will successfully maximize on that problem class.
Okay, so what problem class are you aiming for with TDT? It can’t be the full class of problems where the result depends on your disposition, because there will always be a counter. Do you have a slightly more restricted class in mind?
The TDT I actually worked out is for the class where your payoffs are fully determined by the actual output of your algorithm, but not by other outputs that your algorithm would have made under other conditions. As I described in the “open problems” post, once you allow this sort of strategy-based dependence, then I can depend on your dependence on my dependence on … and I don’t yet know how to stop the recursion. This is closely related to what Wei Dai and I are talking about in terms of the “logical order” of decisions.
If you want to use the current TDT for the Prisoner’s Dilemma, you have to start by proving (or probabilistically expecting) that your opponent’s decision is isomorphic to your own. Not by directly simulating the opponent’s attempt to determine if you cooperate only if they cooperate. Because, as written, the counterfactual surgery that stops the recursion is just over “What if I cooperate?” not “What if I cooperate only if they cooperate?” (Look at the diagonal sentence.)
Okay...
Omega comes along and says “I ran a simulation to see if you would one-box in Newcomb. The answer was yes, so I am now going to feed you to the Ravenous Bugblatter Beast of Traal. Have a nice day.”
Doesn’t this problem fit within your criteria?
If you reject it on the basis of “if you had told me the relevant facts up front, I would’ve made the right decision”, can’t you likewise reject the one where Omega flips a coin before telling you about the proposed bet?
If you have reason in advance to believe that either is likely to occur, you can make an advance decision about what to do.
Does either problem have some particular quality relevant for its classification here, that the other does not?
That’s more like a Counterfactual Mugging, which is the domain of Nesov-Dai updateless decision theory—you’re being rewarded or punished based on a decision you would have made in a different state of knowledge, which is not “you” as I’m defining this problem class. (Which again may sound quite restrictive at this point, but if you look at 95% of the published Newcomblike problems...)
What you need here is for the version of you that Omega simulates facing Newcomb’s Box, to know about the fact that another Omega is going to reward another version of itself (that it cares about) based on its current logical output. If the simulated decision system doesn’t know/believe this, then you really are screwed, but it’s more because now Omega really is an unfair bastard (i.e. doing something outside the problem class) because you’re being punished based on the output of a decision system that didn’t know about the dependency of that event on its output—sort of like Omega, entirely unbeknownst to you, watching you from a rooftop and sniping you if you eat a delicious sandwich.
If the version of you facing Newcomb’s Problem has a prior over Omega doing things like this, even if the other you’s observed reality seems incompatible with that possible world, then this is the sort of thing handled by updateless decision theory.
Right. But then if that is the (reasonable) criterion under which TDT operates, it seems to me that it does indeed handle the case of Omega’s after the fact coin flip bet, in the same way that it handles (some versions of) Newcomb’s problem. How do you figure that it doesn’t?
Because the decision diagonal I wrote out, handles the probable consequences of “this computation” doing something, given its current state of knowledge—its current, updated P—so if it already knows the coinflip (especially a logical coinflip like a binary digit of pi) came up heads, and this coinflip has nothing counterfactually to do with its decision, then it won’t care about what Omega would have done if the coin had come up tails and the currently executing decision diagonal says “don’t pay”.
Ah! so you’re defining “this” as exact bitwise match, I see. Certainly that helps make the conclusions more rigorous. I will suggest the way to handle the after-the-fact coin flip bet is to make the natural extension to sufficiently similar computations.
Note that even selfish agents must do this in order to care about themselves five minutes in the future.
To further motivate the extension, consider the variant of Newcomb where just before making your choice, you are given a piece of paper with a large number written on it; the number has been chosen to be prime or composite depending on whether the money is in the opaque box.
That’s not the problem. The problem is that you’ve already updated your probability distribution, so you just don’t care about the cases where the binary digit came up 0 instead of 1 - not because your utility function isn’t over them, but because they have negligible probability.
(First read that variant in Martin Gardner.) The epistemically intuitive answer is “Once I choose to take one box, I will be able to infer that this number has always been prime”. If I wanted to walk through TDT doing this, I’d draw a causal graph with Omega’s choice descending from my decision diagonal, and sending a prior-message in turn to the parameters of a child node that runs a primality test over numbers and picked this number because it passed (failed), so that—knowing / having decided your logical choice—seeing this number becomes evidence that its primality test came up positive.
In terms of logical control, you don’t control whether the primality test comes up positive on this fixed number, but you do control whether this number got onto the box-label by passing a primality test or a compositeness test.
(I don’t remember where I first read that variant, but Martin Gardner sounds likely.) Yes, I agree with your analysis of it—but that doesn’t contradict the assertion that you can solve these problems by extending your utility function across parallel versions of you who received slightly different sensory data. I will conjecture that this turns out to be the only elegant solution.
Sorry, that doesn’t make any sense. It’s a probability distribution that’s the issue, not a utility function. UDT tosses out the probability distribution entirely. TDT still uses it and therefore fails on Counterfactual Mugging.
It’s precisely the assertion that all such problems have to be solved at the probability distribution level that I’m disputing. I’ll go so far as to make a testable prediction: it will be eventually acknowledged that the notion of a purely selfish agent is a good approximation that nonetheless cannot handle such extreme cases. If you can come up with a theory that handles them all without touching the utility function, I will be interested in seeing it!
None of the decision theories in question assume a purely selfish agent.
No, but most of the example problems do.
It might be nontrivial to do this in a way that doesn’t automatically lead to wireheading (using all available power to simulate many extremely fulfilled versions of itself). Or is that problem even more endemic than this?
This is a statement about my global strategy, the strategy I consider winning. In this strategy, I one-box in the states of knowledge where I don’t know about the monster, and two-box where I know. If Omega told me about the monster, I’d transition to a state of knowledge where I know about it, and, according to the fixed strategy above, I two-box.
In counterfactual mugging, for each instance of mugging, I give away $100 on the mugging side, and receive $10000 on the reward side. This is also a fixed global strategy that gives the actions depending on agent’s state of knowledge.
We already have Disposition-Based Decision Theory—and have had since 2002 or so. I think it’s more a case of whether there is anything more to add.
Thanks for the link! I’ll read the paper more thoroughly later, a quick skim suggests it is along the same lines. Are there any cases where DBDT and TDT give different answers?
I don’t think DBDT gives the right answer if the predictor’s snapshot of the local universe-state was taken before the agent was born (or before humans evolved, or whatever), because the “critical point”, as Fisher defines it, occurs too late. But a one-box chooser can still expect a better outcome.
It looks to me like DBDT is working in the direction of TDT but isn’t quite there yet. It looks similar to the sort of reasoning I was talking about earlier, where you try to define a problem class over payoff-determining properties of algorithms.
But this isn’t the same as a reflectively consistent decision theory, because you can only maximize on the problem class from outside the system—you presume an existing decision process or ability to maximize, and then maximize the dispositions using that existing decision theory. Why not insert yet another step? What if one were to talk about dispositions to choose particular disposition-choosing algorithms as being rational? In other words, maximizing “dispositions” from outside strikes me as close kin to “precommitment”—it doesn’t so much guarantee reflective consistency of viewpoints, as pick one particular viewpoint to have control.
As Drescher points out, if the base theory is a CDT, then there’s still a possibility that DBDT will end up two-boxing if Omega takes a snapshot of the (classical) universe a billion years ago before DBDT places the “critical point”. A base theory of TDT, of course, would one-box, but then you don’t need the edifice of DBDT on top because the edifice doesn’t add anything. So you could define “reflective consistency” in terms of “fixed point under precommitment or disposition-choosing steps”.
TDT is validated by the sort of reasoning that goes into DBDT, but the TDT algorithm itself is a plain-vanilla non-meta decision theory which chooses well on-the-fly without needing to step back and consider its dispositions, or precommit, etc. The Buck Stops Immediately. This is what I mean by “reflective consistency”. (Though I should emphasize that so far this only works on the simple cases that constitute 95% of all published Newcomblike problems, and in complex cases like Wei Dai and I are talking about, I don’t know any good fixed algorithm (let alone a single-step non-meta one).)
Exactly. Unless “cultivating a disposition” amounts to a (subsequent-choice-circumventing) precommitment, you still need a reason, when you make that subsequent choice, to act in accordance with the cultivated disposition. And there’s no good explanation for why that reason should care about whether or not you previously cultivated a disposition.
(Though I think the paper was trying to use dispositions to define “rationality” more than to implement an agent that would consistently carry out those dispositions?)
I didn’t really get the purpose of the paper’s analysis of “rationality talk”. Ultimately, as I understood the paper, it was making a prescriptive argument about how people (as actually implemented) should behave in the scenarios presented (i.e, the “rational” way for them to behave).
That’s just what “dispositions” are in this context—tendencies to behave in particular ways under particular circumstances.
By this conception of what “disposition” means, you can’t cultivate a dispositon for keeping promises—and then break the promises when the chips are down. You are either disposed to keep promises, or you are not.
I had a look a the Wikipedia “Precommitment” article to see whether precommitment is actually as inappropriate as it seems to be being portrayed as.
According to the article, the main issue seems to involve cutting off your own options.
Is a sensible one-boxing agent “precommitting” to one-boxing by “cutting off its own options”—namely the option of two-boxing?
On one hand, they still have the option and a free choice when they come to decide. On the other hand, the choice has been made for them by their own nature—and so they don’t really have the option of choosing any more.
My assessment is that the word is not obviously totally inappropriate.
Does “disposition” have the same negative connotations as “precommitting” has? I would say not: “disposition” seems like a fairly appropriate word to me.
I don’t know if Justin Fisher’s work exactly replicates your own conclusions. However it seems to have much the same motivations, and to have reached many of the same conclusions.
FWIW, it took me about 15 minutes to find that paper in a literature search.
Another relevant paper:
“No regrets: or: Edith Piaf revamps decision theory”.
That one seems to have christened what you tend to refer to as “consistency under reflection” as “desire reflection”.
I don’t seem to like either term very much—but currently don’t have a better alternative to offer.
Violation of desire reflection would be a sufficient condition for violation of dynamic consistency, which in turn is a sufficient condition to violate reflective consistency. I don’t see a necessity link.
The most obvious reply to the point about dispositions to have dispositions is to take a behavourist stance: if a disposition results in particular actions under particular circumstances, then a disposition to have a disposition (plus the ability to self-modify) is just another type of disposition, really.
What the document says about the placing of the “critical point” is:
Consequently, I am not sure where the idea that it could be positioned “too late” comes from. The document pretty clearly places it early on.
Newcomb’s problem? That’s figure 1. You are saying that you can’t easily have a disposition—before you even exist? Just so—unless your maker had a disposition to make you with a certain disposition, that is.
Well, we have a lengthy description of the revised DBDT—so that should hopefully help figure out what its predicted actions are.
The author claims it gets both the The Smoking-Cancer Problem and Newcomb’s problem right—which seems to be a start.