evhub comments on LCDT, A Myopic Decision Theory

evhub 10 Aug 2021 21:54 UTC
LW: 3 AF: 3
AF

But by assumption, it doesn’t think it can influence anything downstream of those (or the probability that they exist, I assume).

This is not true—LCDT is happy to influence nodes downstream of agent nodes, it just doesn’t believe it can influence them through those agent nodes. So LCDT (at decision time) doesn’t believe it can change what HCH does, but it’s happy to change what it does to make it agree with what it thinks HCH will do, even though that utility node is downstream of the HCH agent nodes.
- Joe Collman 10 Aug 2021 23:12 UTC
  LW: 5 AF: 4
  AF Parent
  Ah yes, you’re right there—my mistake.
  However, I still don’t see how LCDT can make good decisions over adjustments to its simulation. That simulation must presumably eventually contain elements classed as agentic.
  Then given any adjustment X which influences the simulation outcome both through agentic paths and non-agentic paths, the LCDT agent will ignore the influence [relative to the prior] through the agentic paths. Therefore it will usually be incorrect about what X is likely to accomplish.
  It seems to me that you’ll also have incoherence issues here too: X can change things so that p(Y = 0) is 0.99 through a non-agentic path, whereas the agents assumes the equivalent of [p(Y = 0) is 0.5] through an agentic path.
  I don’t see how an LCDT agent can make efficient adjustments to its simulation when it won’t be able to decide rationally on those judgements in the presence of agentic elements (which again, I assume must exist to simulate HCH).
  What links here?
  - Don’t Influence the Influencers! by lhc (19 Dec 2021 9:02 UTC; 14 points)
  - Joe Collman's comment on A positive case for how we might succeed at prosaic AI alignment by evhub (26 Nov 2021 23:03 UTC; 6 points)
  - evhub 11 Aug 2021 19:25 UTC
    LW: 5 AF: 4
    AF Parent
    That’s a really interesting thought—I definitely think you’re pointing at a real concern with LCDT now. Some thoughts:
    
    Note that this problem is only with actually running agents internally, not with simply having the objective of imitating/simulating an agent—it’s just that LCDT will try to simulate that agent exclusively via non-agentic means.
    That might actually be a good thing, though! If it’s possible to simulate an agent via non-agentic means, that certainly seems a lot safer than internally instantiating agents—though it might just be impossible to efficiently simulate an agent without instantiating any agents internally, in which case it would be a problem.
    In some sense, the core problem here is just that the LCDT agent needs to understand how to decompose its own decision nodes into individual computations so it can efficiently compute things internally and then know when and when not to label its internal computations as agents. How to decompose nodes into subnodes to properly work with multiple layers is a problem with all CDT-based decision theories, though—and it’s hopefully the sort of problem that finite factored sets will help with.
    - Joe Collman 14 Aug 2021 0:29 UTC
      LW: 3 AF: 3
      AF Parent
      Ok, that mostly makes sense to me. I do think that there are still serious issues (but these may be due to my remaining confusions about the setup: I’m still largely reasoning about it “from outside”, since it feels like it’s trying to do the impossible).
      For instance:
      I agree that the objective of simulating an agent isn’t a problem. I’m just not seeing how that objective can be achieved without the simulation taken as a whole qualifying as an agent. Am I missing some obvious distinction here?
      If for all x in X, sim_A(x) = A(x), then if A is behaviourally an agent over X, sim_A seems to be also.(Replacing equality with approximate equality doesn’t seem to change the situation much in principle)
      [Pre-edit: Or is the idea that we’re usually only concerned with simulating some subset of the agent’s input->output mapping, and that a restriction of some function may have different properties from the original function? (agenthood being such a property)]
      I can see that it may be possible to represent such a simulation as a group of nodes none of which is individually agentic—but presumably the same could be done with a human. It can’t be ok for LCDT to influence agents based on having represented them as collections of individually non-agentic components.
      Even if sim_A is constructed as a Chinese room (w.r.t. agenthood), it’s behaving collectively as an agent.
      “it’s just that LCDT will try to simulate that agent exclusively via non-agentic means”—mostly agreed, and agreed that this would be a good thing (to the extent possible).
      However, I do think there’s a significant difference between e.g.:
      [LCDT will not aim to instantiate agents] (true)
      vs
      [LCDT will not instantiate agents] (potentially false: they may be side-effects)
      
      Side-effect-agents seem plausible if e.g.:
      a) The LCDT agent applies adjustments over collections within its simulation.
      b) An adjustment taking [useful non-agent] to [more useful non-agent] also sometimes takes [useful non-agent] to [agent].
      
      Here it seems important that LCDT may reason poorly if it believes that it might create an agent. I agree that pre-decision-time processing should conclude that LCDT won’t aim to create an agent. I don’t think it will conclude that it won’t create an agent.
      Agreed that finite factored sets seem promising to address any issues that are essentially artefacts of representations. However, the above seem more fundamental, unless I’m missing something.
      Assuming this is actually a problem, it struck me that it may be worth thinking about a condition vaguely like:
      An $L C D T^{n}$ agent cuts links at decision time to every agent other than [ $L C D T^{m}$ agents where m > n].
      The idea being to specify a weaker condition that does enough forwarding-the-guarantee to allow safe instantiation of particular types of agent while still avoiding deception.
      I’m far from clear that anything along these lines would help: it probably doesn’t work, and it doesn’t seem to solve the side-effect-agent problem anyway: [complete indifference to influence on X] and [robustly avoiding creation of X] seem fundamentally incompatible.
      Thoughts welcome. With luck I’m still confused.
      What links here?
      Joe Collman's comment on LCDT, A Myopic Decision Theory by adamShimi (4 Sep 2021 0:13 UTC; 3 points)