AlexMennen comments on Troll Bridge

AlexMennen 15 Sep 2019 16:34 UTC
LW: 2 AF: 1
AF
I think the counterfactuals used by the agent are the correct counterfactuals for someone else to use while reasoning about the agent from the outside, but not the correct counterfactuals for the agent to use while deciding what to do. After all, knowing the agent’s source code, if you see it start to cross the bridge, it is correct to infer that it’s reasoning is inconsistent, and you should expect to see the troll blow up the bridge. But while deciding what to do, the agent should be able to reason about purely causal effects of its counterfactual behavior, screening out other logical implications.
Also, counterfactuals which predict that the bridge blows up seem to be saying that the agent can control whether PA is consistent or inconsistent.
Disagree that that’s what’s happening. The link between the consistency of the reasoning system and the behavior of the agent is because the consistency of the reasoning system controls the agent’s behavior, rather than the other way around. Since the agent is selecting outcomes based on their consequences, it does make sense to speak of the agent choosing actions to some extent, but I think speaking of logical implications of the agent’s actions on the consistency of formal systems as “controlling” the consistency of the formal system seems like an inappropriate attribution of agency to me.
- abramdemski 2 Oct 2019 20:42 UTC
  LW: 6 AF: 3
  AF Parent
  I agree with everything you say here, but I read you as thinking you disagree with me.
  I think the counterfactuals used by the agent are the correct counterfactuals for someone else to use while reasoning about the agent from the outside, but not the correct counterfactuals for the agent to use while deciding what to do.
  Yeah, that’s the problem I’m pointing at, right?
  Disagree that that’s what’s happening. The link between the consistency of the reasoning system and the behavior of the agent is because the consistency of the reasoning system controls the agent’s behavior, rather than the other way around. Since the agent is selecting outcomes based on their consequences, it does make sense to speak of the agent choosing actions to some extent, but I think speaking of logical implications of the agent’s actions on the consistency of formal systems as “controlling” the consistency of the formal system seems like an inappropriate attribution of agency to me.
  I think we just agree on that? As I responded to another comment here:
  The point here is that the agent described is acting like EDT is supposed to—it is checking whether its action implies X. If yes, it is acting as if it controls X in the sense that it is deciding which action to take using those implications. I’m not arguing at all that we should think “implies X” is causal, nor even that the agent has opinions on the matter; only that the agent seems to be doing something wrong, and one way of analyzing what it is doing wrong is to take a CDT stance and say “the agent is behaving as if it controls X”—in the same way that CDT says to EDT “you are behaving as if correlation implies causation” even though EDT would not assent to this interpretation of its decision.
- Gurkenglas 9 Jul 2021 18:25 UTC
  LW: 2 AF: 1
  AF Parent
  Suppose the bridge is safe iff there’s a proof that the bridge is safe. Then you would forbid the reasoning “Suppose I cross. I must have proven it’s safe. Then it’s safe, and I get 10. Let’s cross.”, which seems sane enough in the face of Löb.
  - AlexMennen 12 Jul 2021 22:58 UTC
    2 points
    Parent
    Suppose the bridge is safe iff there’s a proof that the bridge is safe.
    Then you can prove the bridge is safe without any reference to your own actions.
    - Gurkenglas 13 Jul 2021 7:38 UTC
      2 points
      Parent
      Suppose the bridge is safe iff A() would decide to cross?
      - AlexMennen 18 Jul 2021 20:27 UTC
        2 points
        Parent
        Not good enough. If I know that the bridge isn’t safe, and that I’m not going to cross it, then I know that the bridge is safe iff I’m going to cross it, but not crossing it is still the correct decision.