SMK comments on Notes on “Can you control the past”

SMK 20 Oct 2022 17:18 UTC
4 points
1
I had something like the following in mind: you are playing the PD against someone implementing “AlienDT” which you know nothing about except that (i) it’s a completely different algorithm to the one you are implementing, and (ii) that it nonetheless outputs the same action/policy as the algorithm you are implementing with some high probability (say 0.9), in a given decision problem.

It seems to me that you should definitely cooperate in this case, but I have no idea how logi-causalist decision theories are supposed to arrive at that conclusion (if at all).
- Rob Bensinger 20 Oct 2022 17:27 UTC
  4 points
  0
  Parent
  This is why I suggested naming FDT “functional decision theory” rather than “algorithmic decision theory”, when MIRI was discussing names.
  Suppose that Alice is an LDT Agent and Bob is an Alien Agent. The two swap source code. If Alice can verify that Bob (on the input “Alice’s source code”) behaves the same as Alice in the PD, then Alice will cooperate. This is because Alice sees that the two possibilities are (C,C) and (D,D), and the former has higher utility.
  The same holds if Alice is confident in Bob’s relevant conditional behavior for some other reason, but can’t literally view Bob’s source code. Alice evaluates counterfactuals based on “how would Bob behave if I do X? what about if I do Y?”, since those are the differences that can affect utility; knowing the details of Bob’s algorithm doesn’t matter if those details are screened off by Bob’s functional behavior.
  - SMK 21 Oct 2022 15:43 UTC
    1 point
    1
    Parent
    The same holds if Alice is confident in Bob’s relevant conditional behavior for some other reason, but can’t literally view Bob’s source code. Alice evaluates counterfactuals based on “how would Bob behave if I do X? what about if I do Y?”, since those are the differences that can affect utility; knowing the details of Bob’s algorithm doesn’t matter if those details are screened off by Bob’s functional behavior.
    Hm. What kind of dependence is involved here? Doesn’t seem like a case of subjunctive dependence as defined in the FDT papers; the two algorithms are not related in any way beyond that they happen to be correlated.
    Alice evaluates counterfactuals based on “how would Bob behave if I do X? what about if I do Y?”, since those are the differences that can affect utility...
    Sure, but so do all agents that subscribe to standard decision theories. The whole DT debate is about what that means.