Ok, that mostly makes sense to me. I do think that there are still serious issues (but these may be due to my remaining confusions about the setup: I’m still largely reasoning about it “from outside”, since it feels like it’s trying to do the impossible).
For instance:
I agree that the objective of simulating an agent isn’t a problem. I’m just not seeing how that objective can be achieved without the simulation taken as a whole qualifying as an agent. Am I missing some obvious distinction here? If for all x in X, sim_A(x) = A(x), then if A is behaviourally an agent over X, sim_A seems to be also.(Replacing equality with approximate equality doesn’t seem to change the situation much in principle) [Pre-edit: Or is the idea that we’re usually only concerned with simulating some subset of the agent’s input->output mapping, and that a restriction of some function may have different properties from the original function? (agenthood being such a property)]
I can see that it may be possible to represent such a simulation as a group of nodes none of which is individually agentic—but presumably the same could be done with a human. It can’t be ok for LCDT to influence agents based on having represented them as collections of individually non-agentic components.
Even if sim_A is constructed as a Chinese room (w.r.t. agenthood), it’s behaving collectively as an agent.
“it’s just that LCDT will try to simulate that agent exclusively via non-agentic means”—mostly agreed, and agreed that this would be a good thing (to the extent possible). However, I do think there’s a significant difference between e.g.: [LCDT will not aim to instantiate agents] (true) vs [LCDT will not instantiate agents] (potentially false: they may be side-effects)
Side-effect-agents seem plausible if e.g.: a) The LCDT agent applies adjustments over collections within its simulation. b) An adjustment taking [useful non-agent] to [more useful non-agent] also sometimes takes [useful non-agent] to [agent].
Here it seems important that LCDT may reason poorly if it believes that it might create an agent. I agree that pre-decision-time processing should conclude that LCDT won’t aim to create an agent. I don’t think it will conclude that it won’t create an agent.
Agreed that finite factored sets seem promising to address any issues that are essentially artefacts of representations. However, the above seem more fundamental, unless I’m missing something.
Assuming this is actually a problem, it struck me that it may be worth thinking about a condition vaguely like:
An LCDTn agent cuts links at decision time to every agent other than [LCDTm agents where m > n].
The idea being to specify a weaker condition that does enough forwarding-the-guarantee to allow safe instantiation of particular types of agent while still avoiding deception.
I’m far from clear that anything along these lines would help: it probably doesn’t work, and it doesn’t seem to solve the side-effect-agent problem anyway: [complete indifference to influence on X] and [robustly avoiding creation of X] seem fundamentally incompatible.
Ok, that mostly makes sense to me. I do think that there are still serious issues (but these may be due to my remaining confusions about the setup: I’m still largely reasoning about it “from outside”, since it feels like it’s trying to do the impossible).
For instance:
I agree that the objective of simulating an agent isn’t a problem. I’m just not seeing how that objective can be achieved without the simulation taken as a whole qualifying as an agent. Am I missing some obvious distinction here?
If for all x in X, sim_A(x) = A(x), then if A is behaviourally an agent over X, sim_A seems to be also.(Replacing equality with approximate equality doesn’t seem to change the situation much in principle)
[Pre-edit: Or is the idea that we’re usually only concerned with simulating some subset of the agent’s input->output mapping, and that a restriction of some function may have different properties from the original function? (agenthood being such a property)]
I can see that it may be possible to represent such a simulation as a group of nodes none of which is individually agentic—but presumably the same could be done with a human. It can’t be ok for LCDT to influence agents based on having represented them as collections of individually non-agentic components.
Even if sim_A is constructed as a Chinese room (w.r.t. agenthood), it’s behaving collectively as an agent.
“it’s just that LCDT will try to simulate that agent exclusively via non-agentic means”—mostly agreed, and agreed that this would be a good thing (to the extent possible).
However, I do think there’s a significant difference between e.g.:
[LCDT will not aim to instantiate agents] (true)
vs
[LCDT will not instantiate agents] (potentially false: they may be side-effects)
Side-effect-agents seem plausible if e.g.:
a) The LCDT agent applies adjustments over collections within its simulation.
b) An adjustment taking [useful non-agent] to [more useful non-agent] also sometimes takes [useful non-agent] to [agent].
Here it seems important that LCDT may reason poorly if it believes that it might create an agent. I agree that pre-decision-time processing should conclude that LCDT won’t aim to create an agent. I don’t think it will conclude that it won’t create an agent.
Agreed that finite factored sets seem promising to address any issues that are essentially artefacts of representations. However, the above seem more fundamental, unless I’m missing something.
Assuming this is actually a problem, it struck me that it may be worth thinking about a condition vaguely like:
An LCDTn agent cuts links at decision time to every agent other than [LCDTm agents where m > n].
The idea being to specify a weaker condition that does enough forwarding-the-guarantee to allow safe instantiation of particular types of agent while still avoiding deception.
I’m far from clear that anything along these lines would help: it probably doesn’t work, and it doesn’t seem to solve the side-effect-agent problem anyway: [complete indifference to influence on X] and [robustly avoiding creation of X] seem fundamentally incompatible.
Thoughts welcome. With luck I’m still confused.