Right, you can modify the function that evaluates outcomes to change the payoffs (e.g. by making exploitation in the PD have a lower payoff that mutual cooperation, because it “sullies your honor” or whatever) and then CDT will perform correctly. But this is trivially true: I can of course cause that equation to give me the “right” answer by modifying D(O_j) to assign 1 to the “right” outcome and 0 to all other outcomes. The question is how you go about modifying D to identify the “right” answer.
I agree that in sufficiently repetitive environments CDT readily modifies the D function to alter the apparent payoffs in PD-like problems (via “precommitments”), but this is still an unsatisfactory hack.
First of all, the construction of the graph is part of the decision procedure. Sure, in certain situations CDT can fix its flaws by hiding extra logic inside D. However, I’d like to know what that logic is actually doing so that I can put it in the original decision procedure directly.
Secondly, CDT can’t (or, rather, wouldn’t) fix all of its flaws by modifying D—it has some blind spots, which I’ll go into later.
Outside of supernatural opportunities, it’s not obvious to me that this is a bug. I’ll wait for you to make the future arguments at length, unless you want to give a brief version.
(I don’t understand where your objection is here. What do you mean by ‘supernatural’? Do you think you should always twobox in a Newcomb’s problem where Omega is played by Paul Eckman, a good but imperfect predictor?)
You find yourself in a PD against a perfect copy of yourself. At the end of the game, I will remove the money your clone wins, destroy all records of what you did, re-merge you with your clone, erase both our memories of the process, and let you keep the money that you won (you will think it is just a gift to recompense you for sleeping in my lab for a few hours). You had not previously considered this situation possible, and had made no precommitments about what to do in such a scenario. What do you think you should do?
Also, what do you think the right move is on the true PD?
You find yourself in a PD against a perfect copy of yourself. At the end of the game, I will remove the money your clone wins, destroy all records of what you did, re-merge you with your clone, erase both our memories of the process, and let you keep the money that you won (you will think it is just a gift to recompense you for sleeping in my lab for a few hours). You had not previously considered this situation possible, and had made no precommitments about what to do in such a scenario. What do you think you should do?
Given that you’re going to erase my memory of this conversation and burn a lot of other records afterward, it’s entirely possible that you’re lying about whether it’s me or the other me whose payout ‘actually counts.’ Makes no difference to you either way, right? We all look the same, and telling us different stories about the upcoming game would break the assumption of symmetry. Effectively, I’m playing a game of PD followed by a special step in which you flip a fair coin and, on heads, swap my reward with that of the other player.
So, I’d optimize for the combined reward to both myself and my clone, which is to say, for the usual PD payoff matrix, cooperate. If the reward for defecting when the other player cooperates is going to be worth drastically more to my postgame gestalt, to the point that I’d accept a 25% or less chance of that payout in trade for virtual certainty of the payout for mutual cooperation, I would instead behave randomly.
That they either must both hear the same story or else break the assumption of symmetry is an important objection to the hypothetical. Either choice breaks the problem statement as presented.
Thank you! If I was the other clone and heard that I was about to play a game of PD which would have no consequences for anyone except the other player, who was also me, that would distort my incentives.
It’s established in the problem statement that the experimenter is going to destroy or falsify all records of what transpired during the game, including the fact that a game even took place, presumably to rule out cooperation motivated by reputational effects. If you want a perfectly honest and trustworthy experimenter, establish that axiomatically, or at least don’t establish anything that directly contradicts.
Assuming that the other party is a clone with identical starting mind-state makes it a much more tractable problem. I don’t have much idea how perfect reasoners behave; I’ve never met one.
Right, you can modify the function that evaluates outcomes to change the payoffs (e.g. by making exploitation in the PD have a lower payoff that mutual cooperation, because it “sullies your honor” or whatever) and then CDT will perform correctly. But this is trivially true: I can of course cause that equation to give me the “right” answer by modifying D(O_j) to assign 1 to the “right” outcome and 0 to all other outcomes. The question is how you go about modifying D to identify the “right” answer.
I agree with this. It seems to me that answers about how to modify D are basically questions about how to model the future; you need to price the dishonor in defecting, which seems to me to require at least an implicit model of how valuable honor will be over the course of the future. By ‘honor,’ I just mean a computational convenience that abstracts away a feature of the uncertain future, not a terminal value. (Humans might have this built in as a terminal value, but that seems to be because it was cheaper for evolution to do so than the alternative.)
I agree that in sufficiently repetitive environments CDT readily modifies the D function to alter the apparent payoffs in PD-like problems (via “precommitments”), but this is still an unsatisfactory hack.
I don’t think I agree with the claim that this is an unsatisfactory hack. To switch from decision-making to computer vision as the example, I hear your position as saying that neural nets are unsatisfactory for solving computer vision, so we need to develop an extension, and my position as saying that neural nets are the right approach, but we need very wide nets with very many layers. A criticism of my position could be “but of course with enough nodes you can model an arbitrary function, and so you can solve computer vision like you could solve any problem,” but I would put forward the defense that complicated problems require complicated solutions; it seems more likely to me that massive databases of experience will solve the problem than improved algorithmic sophistication.
I don’t understand where your objection is here. What do you mean by ‘supernatural’?
In the natural universe, it looks to me like opportunities that promise retrocausation turn out to be scams, and this is certain enough to be called a fundamental property. In hypothetical universes, this doesn’t have to be the case, but it’s not clear to me how much effort we should spend on optimizing hypothetical universes. In either case, it seems to me this is something that the physics module (i.e. what gives you P(O_j|do(A))) should compute, and only baked into the decision theory by the rules about what sort of causal graphs you think are likely.
Do you think you should always twobox in a Newcomb’s problem where Omega is played by Paul Eckman, a good but imperfect predictor?
Given that professional ethicists are neither nicer nor more dependable than similar people of their background, I’ll jump on the signalling grenade to point out that any public discussion of these sorts of questions is poisoned by signalling. If I expected that publicly declaring my willingness to one-box would increase the chance that I’m approached by Newcomb-like deals, then obviously I would declare my willingness to one-box. As it turns out, I’m trustworthy and dependable in real life, because of both a genetic predisposition towards pro-social behavior (including valuing things occurring after my death) and a reflective endorsement of the myriad benefits of behaving in that way.
You had not previously considered this situation possible, and had made no precommitments about what to do in such a scenario.
I decided a long time ago to cooperate with myself as a general principle, and I think that was more a recognition of my underlying personality than it was a conscious change.
If the copy is perfect, it seems unreasonable to me to not draw a causal arrow between my action and my copy’s action, as I cannot justify the assumption that my action will be independent of my perfect copy’s action. Estimating that the influence is sufficiently high, then it seems that (3,3) is a better option that (0,0). I’m moderately confident a hypothetical me which knew about causal models but hadn’t thought about identity or intertemporal cooperation would use the same line of reasoning to cooperate.
In either case, it seems to me this is something that the physics module (i.e. what gives you P(O_j|do(A))) should compute, and only baked into the decision theory by the rules about what sort of causal graphs you think are likely.
The problem is the do(A) part: the do(.) function ignores logical acausal connections between nodes. That was the theme of this post.
If the copy is perfect, it seems unreasonable to me to not draw a causal arrow between my action and my copy’s action, as I cannot justify the assumption that my action will be independent of my perfect copy’s action.
I agree! If the copy is perfect, there is a connection. However, the connection is not a causal one.
Obviously you want to take the action that maximizes your expected utility, according to probability-weighted outcomes. The question is how you check the outcome that would happen if you took a given action.
Causal counterfactual reasoning prescribes evaluating counterfactuals by intervening on the graph using the do(.) function. This (roughly) involves identifying your action node A, ignoring the causal ancestors, overwriting the node with the function const a (where a is the action under consideration) and seeing what happens. This usually works fine, but there are some cases where this fails to correctly compute the outcomes (namely, where others are reasoning about the contents A, where their internal representations of A were not affected by your do(A=a)).
This is not fundamentally a problem of retrocausality, it’s fundamentally a problem of not knowing how to construct good counterfactuals. What does it mean to consider that a deterministic algorithm returns something that it doesn’t return?do(.) says that it means “imagine you were not you, but were instead const a while other people continue reasoning as if you were you”. It would actually be really surprising if this worked out in situations where others have internal representations of the contents of A (which do(A=.) stomps all over).
You answered that you intuitively feel like you should draw an arrow between you and your clone in the above thought experiment. I agree! But constructing a graph like this (where things that are computed via the same process must have the same output) is actually not something that CDT does. This problem in particular was the motivation behind TDT (which uses a different function besides do(.) to construct counterfactuals that preserve the fact that identical computations will have identical outputs). It sounds like we probably have similar intuitions about decision theory, but perhaps different ideas about what the do(.) function is capable of?
This usually works fine, but there are some cases where this fails to correctly compute the outcomes (namely, where others are reasoning about the contents A, where their internal representations of A were not affected by your do(A=a)).
I still think this should be solved by the physics module.
For example, consider two cases. In case A, Ekman reads everything you’ve ever written on decision theory before September 26th, 2014, and then fills the boxes as if he were Omega, and then you choose whether to one-box or two-box. Ekman’s a good psychologist, but his model of your mind is translucent to you at best- you think it’s more likely than not that he’ll guess correctly what you’ll pick, but know that it’s just mediated by what you’ve written that you can’t change.
In case B, Ekman watches your face as you choose whether to press the one-box button or the two-box button without being able to see the buttons (or your finger), and then predicts your choice. Again, his model of your mind is translucent at best to you; probably he’ll guess correctly, but you don’t know what specifically he’s basing his decision off of (and suppose that even if you did, you know that you don’t have sufficient control over your features to prevent information from leaking).
It seems to me that the two cases deserve different responses- in case A, you don’t think your current thoughts will impact Ekman’s move, but in case B, you do. In a normal token trade, you don’t think your current thoughts will impact your partner’s move, but in a mirror token trade, you do. Those differences in belief are because of actual changes in the perceived causal features of the situation, which seems sensible to me.
That is, I think this is a failure of the process you’re using to build causal maps, not the way you’re navigating those causal maps once they’re built. I keep coming back to the criterion “does a missing arrow imply independence?” because that’s the primary criterion for building useful causal maps, and if you have ‘logical nodes’ like “the decision made by an agent with a template X” then it doesn’t make sense to have a copy of that logical node elsewhere that’s allowed to have a distinct value.
That is, I agree that this question is important:
What does it mean to consider that a deterministic algorithm returns something that it doesn’t return?
But my answer to it is “don’t try to intervene at a node unless your causal model was built under the assumption you could intervene at that node.” The mirror token trade causal map you used in this post works if you intervene at ‘template,’ but I argue it doesn’t work if you intervene at ‘give?’ unless there’s an arrow that points from ‘give?’ to ‘their decision.’
It sounds like we probably have similar intuitions about decision theory, but perhaps different ideas about what the do(.) function is capable of?
I think I see do(.) operator as less capable than you do; in cases where the physicality of our computation matters then we need to have arrows pointing out of the node where we intervene that we don’t need when we can ignore the impacts of having to physically perform computations in reality. Furthermore, it seems to me that when we’re at the level where how we physically process possibilities matters, ‘decision theory’ may not be a useful concept anymore.
Cool, it sounds like we mostly agree. For instance, I agree that once you set up the graph correctly, you can intervene do(.) style and get the Right Answer. The general thrust of these posts is that “setting up the graph correctly” involves drawing in lines / representing world-structure that is generally considered (by many) to be “non-causal”.
Figuring out what graph to draw is indeed the hard part of the problem—my point is merely that “graphs that represent the causal structure of the universe and only the causal structure of the universe” are not the right sort of graphs to draw, in the same way that a propensity theory of probability that only allows information to propagate causally is not a good way to reason about probabilities.
Figuring out what sort of graphs we do want to intervene on requires stepping beyond a purely causal decision theory.
Right, you can modify the function that evaluates outcomes to change the payoffs (e.g. by making exploitation in the PD have a lower payoff that mutual cooperation, because it “sullies your honor” or whatever) and then CDT will perform correctly. But this is trivially true: I can of course cause that equation to give me the “right” answer by modifying D(O_j) to assign 1 to the “right” outcome and 0 to all other outcomes. The question is how you go about modifying D to identify the “right” answer.
I agree that in sufficiently repetitive environments CDT readily modifies the D function to alter the apparent payoffs in PD-like problems (via “precommitments”), but this is still an unsatisfactory hack.
First of all, the construction of the graph is part of the decision procedure. Sure, in certain situations CDT can fix its flaws by hiding extra logic inside D. However, I’d like to know what that logic is actually doing so that I can put it in the original decision procedure directly.
Secondly, CDT can’t (or, rather, wouldn’t) fix all of its flaws by modifying D—it has some blind spots, which I’ll go into later.
(I don’t understand where your objection is here. What do you mean by ‘supernatural’? Do you think you should always twobox in a Newcomb’s problem where Omega is played by Paul Eckman, a good but imperfect predictor?)
You find yourself in a PD against a perfect copy of yourself. At the end of the game, I will remove the money your clone wins, destroy all records of what you did, re-merge you with your clone, erase both our memories of the process, and let you keep the money that you won (you will think it is just a gift to recompense you for sleeping in my lab for a few hours). You had not previously considered this situation possible, and had made no precommitments about what to do in such a scenario. What do you think you should do?
Also, what do you think the right move is on the true PD?
Given that you’re going to erase my memory of this conversation and burn a lot of other records afterward, it’s entirely possible that you’re lying about whether it’s me or the other me whose payout ‘actually counts.’ Makes no difference to you either way, right? We all look the same, and telling us different stories about the upcoming game would break the assumption of symmetry. Effectively, I’m playing a game of PD followed by a special step in which you flip a fair coin and, on heads, swap my reward with that of the other player.
So, I’d optimize for the combined reward to both myself and my clone, which is to say, for the usual PD payoff matrix, cooperate. If the reward for defecting when the other player cooperates is going to be worth drastically more to my postgame gestalt, to the point that I’d accept a 25% or less chance of that payout in trade for virtual certainty of the payout for mutual cooperation, I would instead behave randomly.
Saying “I wouldn’t trust someone like that to tell the truth about whose payout counts” is fighting the hypothetical.
I don’t think you need to assume the other party is a clone; you just need to assume that both you and the other party are perfect reasoners.
That they either must both hear the same story or else break the assumption of symmetry is an important objection to the hypothetical. Either choice breaks the problem statement as presented.
Thank you! If I was the other clone and heard that I was about to play a game of PD which would have no consequences for anyone except the other player, who was also me, that would distort my incentives.
It’s established in the problem statement that the experimenter is going to destroy or falsify all records of what transpired during the game, including the fact that a game even took place, presumably to rule out cooperation motivated by reputational effects. If you want a perfectly honest and trustworthy experimenter, establish that axiomatically, or at least don’t establish anything that directly contradicts.
Assuming that the other party is a clone with identical starting mind-state makes it a much more tractable problem. I don’t have much idea how perfect reasoners behave; I’ve never met one.
I agree with this. It seems to me that answers about how to modify D are basically questions about how to model the future; you need to price the dishonor in defecting, which seems to me to require at least an implicit model of how valuable honor will be over the course of the future. By ‘honor,’ I just mean a computational convenience that abstracts away a feature of the uncertain future, not a terminal value. (Humans might have this built in as a terminal value, but that seems to be because it was cheaper for evolution to do so than the alternative.)
I don’t think I agree with the claim that this is an unsatisfactory hack. To switch from decision-making to computer vision as the example, I hear your position as saying that neural nets are unsatisfactory for solving computer vision, so we need to develop an extension, and my position as saying that neural nets are the right approach, but we need very wide nets with very many layers. A criticism of my position could be “but of course with enough nodes you can model an arbitrary function, and so you can solve computer vision like you could solve any problem,” but I would put forward the defense that complicated problems require complicated solutions; it seems more likely to me that massive databases of experience will solve the problem than improved algorithmic sophistication.
In the natural universe, it looks to me like opportunities that promise retrocausation turn out to be scams, and this is certain enough to be called a fundamental property. In hypothetical universes, this doesn’t have to be the case, but it’s not clear to me how much effort we should spend on optimizing hypothetical universes. In either case, it seems to me this is something that the physics module (i.e. what gives you P(O_j|do(A))) should compute, and only baked into the decision theory by the rules about what sort of causal graphs you think are likely.
Given that professional ethicists are neither nicer nor more dependable than similar people of their background, I’ll jump on the signalling grenade to point out that any public discussion of these sorts of questions is poisoned by signalling. If I expected that publicly declaring my willingness to one-box would increase the chance that I’m approached by Newcomb-like deals, then obviously I would declare my willingness to one-box. As it turns out, I’m trustworthy and dependable in real life, because of both a genetic predisposition towards pro-social behavior (including valuing things occurring after my death) and a reflective endorsement of the myriad benefits of behaving in that way.
I decided a long time ago to cooperate with myself as a general principle, and I think that was more a recognition of my underlying personality than it was a conscious change.
If the copy is perfect, it seems unreasonable to me to not draw a causal arrow between my action and my copy’s action, as I cannot justify the assumption that my action will be independent of my perfect copy’s action. Estimating that the influence is sufficiently high, then it seems that (3,3) is a better option that (0,0). I’m moderately confident a hypothetical me which knew about causal models but hadn’t thought about identity or intertemporal cooperation would use the same line of reasoning to cooperate.
The problem is the
do(A)
part: thedo(.)
function ignores logical acausal connections between nodes. That was the theme of this post.I agree! If the copy is perfect, there is a connection. However, the connection is not a causal one.
Obviously you want to take the action that maximizes your expected utility, according to probability-weighted outcomes. The question is how you check the outcome that would happen if you took a given action.
Causal counterfactual reasoning prescribes evaluating counterfactuals by intervening on the graph using the
do(.)
function. This (roughly) involves identifying your action nodeA
, ignoring the causal ancestors, overwriting the node with the functionconst a
(wherea
is the action under consideration) and seeing what happens. This usually works fine, but there are some cases where this fails to correctly compute the outcomes (namely, where others are reasoning about the contentsA
, where their internal representations ofA
were not affected by yourdo(A=a)
).This is not fundamentally a problem of retrocausality, it’s fundamentally a problem of not knowing how to construct good counterfactuals. What does it mean to consider that a deterministic algorithm returns something that it doesn’t return?
do(.)
says that it means “imagine you were not you, but were insteadconst a
while other people continue reasoning as if you were you”. It would actually be really surprising if this worked out in situations where others have internal representations of the contents ofA
(whichdo(A=.)
stomps all over).You answered that you intuitively feel like you should draw an arrow between you and your clone in the above thought experiment. I agree! But constructing a graph like this (where things that are computed via the same process must have the same output) is actually not something that CDT does. This problem in particular was the motivation behind TDT (which uses a different function besides
do(.)
to construct counterfactuals that preserve the fact that identical computations will have identical outputs). It sounds like we probably have similar intuitions about decision theory, but perhaps different ideas about what thedo(.)
function is capable of?I still think this should be solved by the physics module.
For example, consider two cases. In case A, Ekman reads everything you’ve ever written on decision theory before September 26th, 2014, and then fills the boxes as if he were Omega, and then you choose whether to one-box or two-box. Ekman’s a good psychologist, but his model of your mind is translucent to you at best- you think it’s more likely than not that he’ll guess correctly what you’ll pick, but know that it’s just mediated by what you’ve written that you can’t change.
In case B, Ekman watches your face as you choose whether to press the one-box button or the two-box button without being able to see the buttons (or your finger), and then predicts your choice. Again, his model of your mind is translucent at best to you; probably he’ll guess correctly, but you don’t know what specifically he’s basing his decision off of (and suppose that even if you did, you know that you don’t have sufficient control over your features to prevent information from leaking).
It seems to me that the two cases deserve different responses- in case A, you don’t think your current thoughts will impact Ekman’s move, but in case B, you do. In a normal token trade, you don’t think your current thoughts will impact your partner’s move, but in a mirror token trade, you do. Those differences in belief are because of actual changes in the perceived causal features of the situation, which seems sensible to me.
That is, I think this is a failure of the process you’re using to build causal maps, not the way you’re navigating those causal maps once they’re built. I keep coming back to the criterion “does a missing arrow imply independence?” because that’s the primary criterion for building useful causal maps, and if you have ‘logical nodes’ like “the decision made by an agent with a template X” then it doesn’t make sense to have a copy of that logical node elsewhere that’s allowed to have a distinct value.
That is, I agree that this question is important:
But my answer to it is “don’t try to intervene at a node unless your causal model was built under the assumption you could intervene at that node.” The mirror token trade causal map you used in this post works if you intervene at ‘template,’ but I argue it doesn’t work if you intervene at ‘give?’ unless there’s an arrow that points from ‘give?’ to ‘their decision.’
I think I see do(.) operator as less capable than you do; in cases where the physicality of our computation matters then we need to have arrows pointing out of the node where we intervene that we don’t need when we can ignore the impacts of having to physically perform computations in reality. Furthermore, it seems to me that when we’re at the level where how we physically process possibilities matters, ‘decision theory’ may not be a useful concept anymore.
Cool, it sounds like we mostly agree. For instance, I agree that once you set up the graph correctly, you can intervene
do(.)
style and get the Right Answer. The general thrust of these posts is that “setting up the graph correctly” involves drawing in lines / representing world-structure that is generally considered (by many) to be “non-causal”.Figuring out what graph to draw is indeed the hard part of the problem—my point is merely that “graphs that represent the causal structure of the universe and only the causal structure of the universe” are not the right sort of graphs to draw, in the same way that a propensity theory of probability that only allows information to propagate causally is not a good way to reason about probabilities.
Figuring out what sort of graphs we do want to intervene on requires stepping beyond a purely causal decision theory.