eg comments on LCDT, A Myopic Decision Theory

eg 4 Aug 2021 12:53 UTC
1 point
Seems potentially valuable as an additional layer of capability control to buy time for further control research. I suspect LCDT won’t hold once intelligence reaches some threshold: some sense of agents, even if indirect, is such a natural thing to learn about the world.
- adamShimi 4 Aug 2021 17:29 UTC
  2 points
  Parent
  Could you give a more explicit example of what you think might go wrong? I feel like your argument that agency is natural to learn actually goes in LCDT’s favor, because it requires an accurate (or at least an overapproximation) of tagging things in its causal model as agentic.
  - eg 5 Aug 2021 15:45 UTC
    3 points
    Parent
    For a start, low-level deterministic reasoning:
    
    ”Obviously I could never influence an agent, but I found some inputs to deterministic biological neural nets that would make things I want happen.”
    
    ″Obviously I could never influence my future self, but if I change a few logic gates in this processor, it would make things I want happen.”
    - eg 5 Aug 2021 15:53 UTC
      3 points
      Parent
      Probabilistic/inductive reasoning from past/simulated data (possibly assumes imperfect implementation of LCDT):
      
      ”This is really weird because obviously I could never influence an agent, but when past/simulated agents that look a lot like me did X, humans did Y in 90% of cases, so I guess the EV of doing X is 0.9 * utility(Y).”
      
      Cf. smart humans in Newcomb’s prob: “This is really weird but if I one box I get the million, if I two-box I don’t, so I guess I’ll just one box.”
      - adamShimi 5 Aug 2021 23:00 UTC
        2 points
        Parent
        Yeah, I think this assumes an imperfect implementation. This relation can definitely be learned by the causal model (and is probably learned before the first real decision), but when the decision happen, it is cut. So it’s like a true LCDT agent learns about influences over agent, but forget its own ability to do that when deciding.
    - adamShimi 5 Aug 2021 22:58 UTC
      2 points
      Parent
      These examples seem related to the abstraction question: we want the model to know that it is splitting an agent into parts, and still believe it can’t influence the agent as a hole. If we could realize this, then the LCDT agent wouldn’t believe it could influence the neural net/the logic gates.
  - [ ]
    [deleted]