I think the counterfactuals used by the agent are the correct counterfactuals for someone else to use while reasoning about the agent from the outside, but not the correct counterfactuals for the agent to use while deciding what to do. After all, knowing the agent’s source code, if you see it start to cross the bridge, it is correct to infer that it’s reasoning is inconsistent, and you should expect to see the troll blow up the bridge. But while deciding what to do, the agent should be able to reason about purely causal effects of its counterfactual behavior, screening out other logical implications.
Also, counterfactuals which predict that the bridge blows up seem to be saying that the agent can control whether PA is consistent or inconsistent.
Disagree that that’s what’s happening. The link between the consistency of the reasoning system and the behavior of the agent is because the consistency of the reasoning system controls the agent’s behavior, rather than the other way around. Since the agent is selecting outcomes based on their consequences, it does make sense to speak of the agent choosing actions to some extent, but I think speaking of logical implications of the agent’s actions on the consistency of formal systems as “controlling” the consistency of the formal system seems like an inappropriate attribution of agency to me.
I agree with everything you say here, but I read you as thinking you disagree with me.
I think the counterfactuals used by the agent are the correct counterfactuals for someone else to use while reasoning about the agent from the outside, but not the correct counterfactuals for the agent to use while deciding what to do.
Yeah, that’s the problem I’m pointing at, right?
Disagree that that’s what’s happening. The link between the consistency of the reasoning system and the behavior of the agent is because the consistency of the reasoning system controls the agent’s behavior, rather than the other way around. Since the agent is selecting outcomes based on their consequences, it does make sense to speak of the agent choosing actions to some extent, but I think speaking of logical implications of the agent’s actions on the consistency of formal systems as “controlling” the consistency of the formal system seems like an inappropriate attribution of agency to me.
I think we just agree on that? As I responded to another comment here:
The point here is that the agent described is acting like EDT is supposed to—it is checking whether its action implies X. If yes, it is acting as if it controls X in the sense that it is deciding which action to take using those implications. I’m not arguing at all that we should think “implies X” is causal, nor even that the agent has opinions on the matter; only that the agent seems to be doing something wrong, and one way of analyzing what it is doing wrong is to take a CDT stance and say “the agent is behaving as if it controls X”—in the same way that CDT says to EDT “you are behaving as if correlation implies causation” even though EDT would not assent to this interpretation of its decision.
Suppose the bridge is safe iff there’s a proof that the bridge is safe. Then you would forbid the reasoning “Suppose I cross. I must have proven it’s safe. Then it’s safe, and I get 10. Let’s cross.”, which seems sane enough in the face of Löb.
Not good enough. If I know that the bridge isn’t safe, and that I’m not going to cross it, then I know that the bridge is safe iff I’m going to cross it, but not crossing it is still the correct decision.
I think the counterfactuals used by the agent are the correct counterfactuals for someone else to use while reasoning about the agent from the outside, but not the correct counterfactuals for the agent to use while deciding what to do. After all, knowing the agent’s source code, if you see it start to cross the bridge, it is correct to infer that it’s reasoning is inconsistent, and you should expect to see the troll blow up the bridge. But while deciding what to do, the agent should be able to reason about purely causal effects of its counterfactual behavior, screening out other logical implications.
Disagree that that’s what’s happening. The link between the consistency of the reasoning system and the behavior of the agent is because the consistency of the reasoning system controls the agent’s behavior, rather than the other way around. Since the agent is selecting outcomes based on their consequences, it does make sense to speak of the agent choosing actions to some extent, but I think speaking of logical implications of the agent’s actions on the consistency of formal systems as “controlling” the consistency of the formal system seems like an inappropriate attribution of agency to me.
I agree with everything you say here, but I read you as thinking you disagree with me.
Yeah, that’s the problem I’m pointing at, right?
I think we just agree on that? As I responded to another comment here:
Suppose the bridge is safe iff there’s a proof that the bridge is safe. Then you would forbid the reasoning “Suppose I cross. I must have proven it’s safe. Then it’s safe, and I get 10. Let’s cross.”, which seems sane enough in the face of Löb.
Then you can prove the bridge is safe without any reference to your own actions.
Suppose the bridge is safe iff A() would decide to cross?
Not good enough. If I know that the bridge isn’t safe, and that I’m not going to cross it, then I know that the bridge is safe iff I’m going to cross it, but not crossing it is still the correct decision.