However, I still don’t see how LCDT can make good decisions over adjustments to its simulation. That simulation must presumably eventually contain elements classed as agentic. Then given any adjustment X which influences the simulation outcome both through agentic paths and non-agentic paths, the LCDT agent will ignore the influence [relative to the prior] through the agentic paths. Therefore it will usually be incorrect about what X is likely to accomplish.
It seems to me that you’ll also have incoherence issues here too: X can change things so that p(Y = 0) is 0.99 through a non-agentic path, whereas the agents assumes the equivalent of [p(Y = 0) is 0.5] through an agentic path.
I don’t see how an LCDT agent can make efficient adjustments to its simulation when it won’t be able to decide rationally on those judgements in the presence of agentic elements (which again, I assume must exist to simulate HCH).
That’s a really interesting thought—I definitely think you’re pointing at a real concern with LCDT now. Some thoughts:
Note that this problem is only with actually running agents internally, not with simply having the objective of imitating/simulating an agent—it’s just that LCDT will try to simulate that agent exclusively via non-agentic means.
That might actually be a good thing, though! If it’s possible to simulate an agent via non-agentic means, that certainly seems a lot safer than internally instantiating agents—though it might just be impossible to efficiently simulate an agent without instantiating any agents internally, in which case it would be a problem.
In some sense, the core problem here is just that the LCDT agent needs to understand how to decompose its own decision nodes into individual computations so it can efficiently compute things internally and then know when and when not to label its internal computations as agents. How to decompose nodes into subnodes to properly work with multiple layers is a problem with all CDT-based decision theories, though—and it’s hopefully the sort of problem that finite factored sets will help with.
Ok, that mostly makes sense to me. I do think that there are still serious issues (but these may be due to my remaining confusions about the setup: I’m still largely reasoning about it “from outside”, since it feels like it’s trying to do the impossible).
For instance:
I agree that the objective of simulating an agent isn’t a problem. I’m just not seeing how that objective can be achieved without the simulation taken as a whole qualifying as an agent. Am I missing some obvious distinction here? If for all x in X, sim_A(x) = A(x), then if A is behaviourally an agent over X, sim_A seems to be also.(Replacing equality with approximate equality doesn’t seem to change the situation much in principle) [Pre-edit: Or is the idea that we’re usually only concerned with simulating some subset of the agent’s input->output mapping, and that a restriction of some function may have different properties from the original function? (agenthood being such a property)]
I can see that it may be possible to represent such a simulation as a group of nodes none of which is individually agentic—but presumably the same could be done with a human. It can’t be ok for LCDT to influence agents based on having represented them as collections of individually non-agentic components.
Even if sim_A is constructed as a Chinese room (w.r.t. agenthood), it’s behaving collectively as an agent.
“it’s just that LCDT will try to simulate that agent exclusively via non-agentic means”—mostly agreed, and agreed that this would be a good thing (to the extent possible). However, I do think there’s a significant difference between e.g.: [LCDT will not aim to instantiate agents] (true) vs [LCDT will not instantiate agents] (potentially false: they may be side-effects)
Side-effect-agents seem plausible if e.g.: a) The LCDT agent applies adjustments over collections within its simulation. b) An adjustment taking [useful non-agent] to [more useful non-agent] also sometimes takes [useful non-agent] to [agent].
Here it seems important that LCDT may reason poorly if it believes that it might create an agent. I agree that pre-decision-time processing should conclude that LCDT won’t aim to create an agent. I don’t think it will conclude that it won’t create an agent.
Agreed that finite factored sets seem promising to address any issues that are essentially artefacts of representations. However, the above seem more fundamental, unless I’m missing something.
Assuming this is actually a problem, it struck me that it may be worth thinking about a condition vaguely like:
An LCDTn agent cuts links at decision time to every agent other than [LCDTm agents where m > n].
The idea being to specify a weaker condition that does enough forwarding-the-guarantee to allow safe instantiation of particular types of agent while still avoiding deception.
I’m far from clear that anything along these lines would help: it probably doesn’t work, and it doesn’t seem to solve the side-effect-agent problem anyway: [complete indifference to influence on X] and [robustly avoiding creation of X] seem fundamentally incompatible.
Ah yes, you’re right there—my mistake.
However, I still don’t see how LCDT can make good decisions over adjustments to its simulation. That simulation must presumably eventually contain elements classed as agentic.
Then given any adjustment X which influences the simulation outcome both through agentic paths and non-agentic paths, the LCDT agent will ignore the influence [relative to the prior] through the agentic paths. Therefore it will usually be incorrect about what X is likely to accomplish.
It seems to me that you’ll also have incoherence issues here too: X can change things so that p(Y = 0) is 0.99 through a non-agentic path, whereas the agents assumes the equivalent of [p(Y = 0) is 0.5] through an agentic path.
I don’t see how an LCDT agent can make efficient adjustments to its simulation when it won’t be able to decide rationally on those judgements in the presence of agentic elements (which again, I assume must exist to simulate HCH).
That’s a really interesting thought—I definitely think you’re pointing at a real concern with LCDT now. Some thoughts:
Note that this problem is only with actually running agents internally, not with simply having the objective of imitating/simulating an agent—it’s just that LCDT will try to simulate that agent exclusively via non-agentic means.
That might actually be a good thing, though! If it’s possible to simulate an agent via non-agentic means, that certainly seems a lot safer than internally instantiating agents—though it might just be impossible to efficiently simulate an agent without instantiating any agents internally, in which case it would be a problem.
In some sense, the core problem here is just that the LCDT agent needs to understand how to decompose its own decision nodes into individual computations so it can efficiently compute things internally and then know when and when not to label its internal computations as agents. How to decompose nodes into subnodes to properly work with multiple layers is a problem with all CDT-based decision theories, though—and it’s hopefully the sort of problem that finite factored sets will help with.
Ok, that mostly makes sense to me. I do think that there are still serious issues (but these may be due to my remaining confusions about the setup: I’m still largely reasoning about it “from outside”, since it feels like it’s trying to do the impossible).
For instance:
I agree that the objective of simulating an agent isn’t a problem. I’m just not seeing how that objective can be achieved without the simulation taken as a whole qualifying as an agent. Am I missing some obvious distinction here?
If for all x in X, sim_A(x) = A(x), then if A is behaviourally an agent over X, sim_A seems to be also.(Replacing equality with approximate equality doesn’t seem to change the situation much in principle)
[Pre-edit: Or is the idea that we’re usually only concerned with simulating some subset of the agent’s input->output mapping, and that a restriction of some function may have different properties from the original function? (agenthood being such a property)]
I can see that it may be possible to represent such a simulation as a group of nodes none of which is individually agentic—but presumably the same could be done with a human. It can’t be ok for LCDT to influence agents based on having represented them as collections of individually non-agentic components.
Even if sim_A is constructed as a Chinese room (w.r.t. agenthood), it’s behaving collectively as an agent.
“it’s just that LCDT will try to simulate that agent exclusively via non-agentic means”—mostly agreed, and agreed that this would be a good thing (to the extent possible).
However, I do think there’s a significant difference between e.g.:
[LCDT will not aim to instantiate agents] (true)
vs
[LCDT will not instantiate agents] (potentially false: they may be side-effects)
Side-effect-agents seem plausible if e.g.:
a) The LCDT agent applies adjustments over collections within its simulation.
b) An adjustment taking [useful non-agent] to [more useful non-agent] also sometimes takes [useful non-agent] to [agent].
Here it seems important that LCDT may reason poorly if it believes that it might create an agent. I agree that pre-decision-time processing should conclude that LCDT won’t aim to create an agent. I don’t think it will conclude that it won’t create an agent.
Agreed that finite factored sets seem promising to address any issues that are essentially artefacts of representations. However, the above seem more fundamental, unless I’m missing something.
Assuming this is actually a problem, it struck me that it may be worth thinking about a condition vaguely like:
An LCDTn agent cuts links at decision time to every agent other than [LCDTm agents where m > n].
The idea being to specify a weaker condition that does enough forwarding-the-guarantee to allow safe instantiation of particular types of agent while still avoiding deception.
I’m far from clear that anything along these lines would help: it probably doesn’t work, and it doesn’t seem to solve the side-effect-agent problem anyway: [complete indifference to influence on X] and [robustly avoiding creation of X] seem fundamentally incompatible.
Thoughts welcome. With luck I’m still confused.