The key point is that “acting like an LDT agent” in contexts where your commitment causally influences others’ predictions of your behavior, does not imply you’ll “act like an LDT agent” in contexts where that doesn’t hold. (And I would dispute that we should label making a commitment to a mutually beneficial deal as “acting like an LDT agent,” anyway.) In principle, maybe the simplest generalization of the former is LDT. But if doing LDT things in the latter contexts is materially costly for you (e.g. paying in a truly one-shot Counterfactual Mugging), seems to me that LDT would be selected against.
ETA: The more action-relevant example in the context of this post, rather than one-shot CM, is: “Committing to a fair demand, when you have values and priors such that a more hawkish demand would be preferable ex ante, and the other agents you’ll bargain with don’t observe your commitment before they make their own commitments.” I don’t buy that that sort of behavior is selected for, at least not strongly enough to justify the claim I respond to in the third section.
(And I would dispute that we should label making a commitment to a mutually beneficial deal as “acting like an LDT agent,” anyway.)
You said “Bob commits to LDT ahead of time” in the paragraph I quoted, I was referring to that.
But if doing LDT things in the latter contexts is materially costly for you (e.g. paying in a truly one-shot Counterfactual Mugging), seems to me that LDT would be selected against.
I think a CDT agent would pre-commit to paying in a one-off Counterfactual Mugging since they have a 50% chance of gaining $10000 and a 50% chance of losing $100. Or if they don’t know that a Counterfactual Mugging is going to happen, they’d have an incentive to broadly pre-commit to pay out in similar situations(essentially, acting like an LDT agent). Or if they won’t do either of those things, they will get less future expected resources than an LDT agent.
The more action-relevant example [...] Committing to a fair demand, when you have values and priors such that a more hawkish demand would be preferable
Same as above, I think it’s either the case that CDT agents would tend to make pre-commitments to act LDT-like in such situations, or will lose expected resources compared to LDT agents. You can’t have your CDT cake and eat it too!
In the context of that quote, I was saying why I don’t buy the claim that following LDT gives you advantages over committing to, in future problems, do stuff that’s good for you to commit to do ex ante even if it would be bad for you ex post had you not been committed.
What is selected-for is being the sort of agent who, when others observe you, they update towards doing stuff that’s good for you. This is distinct from being the sort of agent who does stuff that would have helped you if you had been able to shape others’ beliefs / incentives, when in fact you didn’t have such an opportunity.
I think a CDT agent would pre-commit to paying in a one-off Counterfactual Mugging
Sorry I guess I wasn’t clear what I meant by “one-shot” here / maybe I just used the wrong term—I was assuming the agent didn’t have the opportunity to commit in this way. They just find themselves presented with this situation.
Same as above
Hmm, I’m not sure you’re addressing my point here:
Imagine that you’re an AGI, and either in training or earlier in your lifetime you faced situations where it was helpful for you to commit to, as above, “do stuff that’s good for you to commit to do ex ante even if it would be bad for you ex post had you not been committed.” You tended to do better when you made such commitments.
But now you find yourself thinking about this commitment races stuff. And, importantly, you have not previously broadcast credible commitments to a bargaining policy to your counterpart. Do you have compelling reasons to think you and your counterpart have been selected to have decision procedures that are so strongly logically linked, that your decision to demand more than a fair bargain implies your counterpart does the same? I don’t see why. But that’s what we’d need for the Fair Policy to work as robustly as Eliezer seems to think it does.
In the context of that quote, I was saying why I don’t buy the claim that following LDT gives you advantages over committing to, in future problems, do stuff that’s good for you to commit to do *ex ante *even if it would be bad for you *ex post *had you not been committed.
Yes, but isn’t this essentially the same as LDT? It seems to me that different sections of your essay are inconsistent with each other, in that in earlier sections you argue that CDT agents might not adopt LDT-recommended policies and so will have problems with bargaining, but in the last section, you say that CDT agents are not at a competitive disadvantage because they can simply commit to act like LDT agents all the time. But if they so commit, the problems with bargaining won’t come up. I think it would make more sense to argue that empirically, situations selecting for LDT simply won’t arise(but then will arise and be important later).
What is selected-for is being the sort of agent who, *when others observe you, *they update towards doing stuff that’s good for you
I don’t quite understand what you mean here—are you saying that CDT agents will only cooperate if they think it will be causally beneficial, by causing them to have a good reputation with other agents? But we were discussing a case(counterfactual mugging) where they would want to pre-commit to act in ways that would be non-causally beneficial. So I think there would be selection to act non-causally in such cases(unless, again, you just think such situations will never arise, but that’s a different argument)
Do you have compelling reasons to think you and your counterpart have been selected to have decision procedures that are so strongly logically linked, that your decision to demand more than a fair bargain implies your counterpart does the same
I don’t see why you have to assume that your counterpart is strongly logically-linked with you, there are other reasons that you might not want to demand too much. Maybe you know their source code and can simulate that they will not accept a too-high demand. Or perhaps you think, based on empirical evidence or a priori reasoning that most agents you might encounter will only accept a roughly fair allocation.
in earlier sections you argue that CDT agents might not adopt LDT-recommended policies and so will have problems with bargaining
That wasn’t my claim. I was claiming that even if you’re an “LDT” agent, there’s no particular reason to think all your bargaining counterparts will pick the Fair Policy given you do. This is because:
Your bargaining counterparts won’t necessarily consult LDT.
Even if they do, it’s super unrealistic to think of the decision-making of agents in high-stakes bargaining problems as entirely reducible to “do what [decision theory X] recommends.”
Even if decision-making in these problems were as simple as that, why should we think all agents will converge to using the same simple method of decision-making? Seems like if an agent is capable of de-correlateing their decision-making in bargaining from their counterpart, and their counterpart knows this or anticipates it on priors, that agent has an incentive to do so if they can be sufficiently confident that their counterpart will concede to their hawkish demand.
So no, “committing to act like LDT agents all the time,” in the sense that is helpful for avoiding selection pressures against you, does not ensure you’ll have a decision procedure such that you have no bargaining problems.
But we were discussing a case(counterfactual mugging) where they would want to pre-commit to act in ways that would be non-causally beneficial.
I’m confused, the commitment is to act in a certain way that, had you not committed, wouldn’t be beneficial unless you appealed to acausal (and updateless) considerations. But the act of committing has causal benefits.
there are other reasons that you might not want to demand too much. Maybe you know their source code and can simulate that they will not accept a too-high demand. Or perhaps you think, based on empirical evidence or a priori reasoning that most agents you might encounter will only accept a roughly fair allocation.
I agree these are both important possibilities, but:
The reasoning “I see that they’ve committed to refuse high demands, so I should only make a compatible demand” can just be turned on its head and used by the agent who commits to the high demand.
One might also think on priors that some agents might be committed to high demands, therefore strictly insisting on fair demands against all agents is risky.
I was specifically replying to the claim that the sorts of AGIs who would get into high-stakes bargaining would always avoid catastrophic conflict because of bargaining problems; such a claim requires something stronger than the considerations you’ve raised, i.e., an argument that all such AGIs would adopt the same decision procedure (and account for logical causation) and therefore coordinate their demands.
(By default if I don’t reply further, it’s because I think your further objections were already addressed—which I think is true of some of the things I’ve replied to in this comment.)
The key point is that “acting like an LDT agent” in contexts where your commitment causally influences others’ predictions of your behavior, does not imply you’ll “act like an LDT agent” in contexts where that doesn’t hold. (And I would dispute that we should label making a commitment to a mutually beneficial deal as “acting like an LDT agent,” anyway.) In principle, maybe the simplest generalization of the former is LDT. But if doing LDT things in the latter contexts is materially costly for you (e.g. paying in a truly one-shot Counterfactual Mugging), seems to me that LDT would be selected against.
ETA: The more action-relevant example in the context of this post, rather than one-shot CM, is: “Committing to a fair demand, when you have values and priors such that a more hawkish demand would be preferable ex ante, and the other agents you’ll bargain with don’t observe your commitment before they make their own commitments.” I don’t buy that that sort of behavior is selected for, at least not strongly enough to justify the claim I respond to in the third section.
You said “Bob commits to LDT ahead of time” in the paragraph I quoted, I was referring to that.
I think a CDT agent would pre-commit to paying in a one-off Counterfactual Mugging since they have a 50% chance of gaining $10000 and a 50% chance of losing $100. Or if they don’t know that a Counterfactual Mugging is going to happen, they’d have an incentive to broadly pre-commit to pay out in similar situations(essentially, acting like an LDT agent). Or if they won’t do either of those things, they will get less future expected resources than an LDT agent.
Same as above, I think it’s either the case that CDT agents would tend to make pre-commitments to act LDT-like in such situations, or will lose expected resources compared to LDT agents. You can’t have your CDT cake and eat it too!
In the context of that quote, I was saying why I don’t buy the claim that following LDT gives you advantages over committing to, in future problems, do stuff that’s good for you to commit to do ex ante even if it would be bad for you ex post had you not been committed.
What is selected-for is being the sort of agent who, when others observe you, they update towards doing stuff that’s good for you. This is distinct from being the sort of agent who does stuff that would have helped you if you had been able to shape others’ beliefs / incentives, when in fact you didn’t have such an opportunity.
Sorry I guess I wasn’t clear what I meant by “one-shot” here / maybe I just used the wrong term—I was assuming the agent didn’t have the opportunity to commit in this way. They just find themselves presented with this situation.
Hmm, I’m not sure you’re addressing my point here:
Imagine that you’re an AGI, and either in training or earlier in your lifetime you faced situations where it was helpful for you to commit to, as above, “do stuff that’s good for you to commit to do ex ante even if it would be bad for you ex post had you not been committed.” You tended to do better when you made such commitments.
But now you find yourself thinking about this commitment races stuff. And, importantly, you have not previously broadcast credible commitments to a bargaining policy to your counterpart. Do you have compelling reasons to think you and your counterpart have been selected to have decision procedures that are so strongly logically linked, that your decision to demand more than a fair bargain implies your counterpart does the same? I don’t see why. But that’s what we’d need for the Fair Policy to work as robustly as Eliezer seems to think it does.
Yes, but isn’t this essentially the same as LDT? It seems to me that different sections of your essay are inconsistent with each other, in that in earlier sections you argue that CDT agents might not adopt LDT-recommended policies and so will have problems with bargaining, but in the last section, you say that CDT agents are not at a competitive disadvantage because they can simply commit to act like LDT agents all the time. But if they so commit, the problems with bargaining won’t come up. I think it would make more sense to argue that empirically, situations selecting for LDT simply won’t arise(but then will arise and be important later).
I don’t quite understand what you mean here—are you saying that CDT agents will only cooperate if they think it will be causally beneficial, by causing them to have a good reputation with other agents? But we were discussing a case(counterfactual mugging) where they would want to pre-commit to act in ways that would be non-causally beneficial. So I think there would be selection to act non-causally in such cases(unless, again, you just think such situations will never arise, but that’s a different argument)
I don’t see why you have to assume that your counterpart is strongly logically-linked with you, there are other reasons that you might not want to demand too much. Maybe you know their source code and can simulate that they will not accept a too-high demand. Or perhaps you think, based on empirical evidence or a priori reasoning that most agents you might encounter will only accept a roughly fair allocation.
That wasn’t my claim. I was claiming that even if you’re an “LDT” agent, there’s no particular reason to think all your bargaining counterparts will pick the Fair Policy given you do. This is because:
Your bargaining counterparts won’t necessarily consult LDT.
Even if they do, it’s super unrealistic to think of the decision-making of agents in high-stakes bargaining problems as entirely reducible to “do what [decision theory X] recommends.”
Even if decision-making in these problems were as simple as that, why should we think all agents will converge to using the same simple method of decision-making? Seems like if an agent is capable of de-correlateing their decision-making in bargaining from their counterpart, and their counterpart knows this or anticipates it on priors, that agent has an incentive to do so if they can be sufficiently confident that their counterpart will concede to their hawkish demand.
So no, “committing to act like LDT agents all the time,” in the sense that is helpful for avoiding selection pressures against you, does not ensure you’ll have a decision procedure such that you have no bargaining problems.
I’m confused, the commitment is to act in a certain way that, had you not committed, wouldn’t be beneficial unless you appealed to acausal (and updateless) considerations. But the act of committing has causal benefits.
I agree these are both important possibilities, but:
The reasoning “I see that they’ve committed to refuse high demands, so I should only make a compatible demand” can just be turned on its head and used by the agent who commits to the high demand.
One might also think on priors that some agents might be committed to high demands, therefore strictly insisting on fair demands against all agents is risky.
I was specifically replying to the claim that the sorts of AGIs who would get into high-stakes bargaining would always avoid catastrophic conflict because of bargaining problems; such a claim requires something stronger than the considerations you’ve raised, i.e., an argument that all such AGIs would adopt the same decision procedure (and account for logical causation) and therefore coordinate their demands.
(By default if I don’t reply further, it’s because I think your further objections were already addressed—which I think is true of some of the things I’ve replied to in this comment.)