After thinking about it some more, I don’t think this is true.
A concrete example: Let’s say there’s a CDT paperclip maximizer in an environment with Newcomb-like problems that’s deciding between 3 options.
1. Don’t hand control to any successor
2. Hand off control to a “LDT about correlations formed after 7am, CDT about correlations formed before 7am” successor
3. Hand off control to a LDT successor.
My understanding is that the CDT agent would take the choice that causes the highest number of paperclips to be created (in expectation). If both successors are aligned with the CDT agent, I would expect the CDT agent to choose option #3. The LDT successor agent would be able to gain more resources (and thus create more paperclips) than the other two possible agents, when faced with a Newcomb-like problem with correlations formed before the succession time. The CDT agent can cause this outcome to happen if and only if it chooses option #3.
I’m not at all sure that son-of-CDT resembles any known logical decision theory, but I don’t see why it would resemble “LDT about correlations formed after 7am, CDT about correlations formed before 7am”.
Edit: I agree that a CDT agent will never agree to precommit to acting like a LDT agent for correlations that have already been created, but I don’t think that determines what kind of successor agent they would choose to create.
My understanding is that the CDT agent would take the choice that causes the highest number of paperclips to be created (in expectation).
This is true if we mean something very specific by “causes”. CDT picks the action that would cause the highest number of paperclips to be created, if past predictions were uncorrelated with future events.
I agree that a CDT agent will never agree to precommit to acting like a LDT agent for correlations that have already been created, but I don’t think that determines what kind of successor agent they would choose to create.
If an agent can arbitrarily modify its own source code (“precommit” in full generality), then we can model “the agent making choices over time” as “a series of agents that are constantly choosing which successor-agent follows them at the next time-step”. If Son-of-CDT were the same as LDT, this would be the same as saying that a self-modifying CDT agent will rewrite itself into an LDT agent, since nothing about CDT or LDT assigns special weight to actions that happen inside the agent’s brain vs. outside the agent’s brain.
Yeah, I was implicitly assuming that initiating a successor agent would force Omega to update its predictions about the new agent (and put the $1m in the box). As you say, that’s actually not very relevant, because it’s a property of a specific decision problem rather than CDT or son-of-CDT.
After thinking about it some more, I don’t think this is true.
A concrete example: Let’s say there’s a CDT paperclip maximizer in an environment with Newcomb-like problems that’s deciding between 3 options.
1. Don’t hand control to any successor
2. Hand off control to a “LDT about correlations formed after 7am, CDT about correlations formed before 7am” successor
3. Hand off control to a LDT successor.
My understanding is that the CDT agent would take the choice that causes the highest number of paperclips to be created (in expectation). If both successors are aligned with the CDT agent, I would expect the CDT agent to choose option #3. The LDT successor agent would be able to gain more resources (and thus create more paperclips) than the other two possible agents, when faced with a Newcomb-like problem with correlations formed before the succession time. The CDT agent can cause this outcome to happen if and only if it chooses option #3.
I’m not at all sure that son-of-CDT resembles any known logical decision theory, but I don’t see why it would resemble “LDT about correlations formed after 7am, CDT about correlations formed before 7am”.
Edit: I agree that a CDT agent will never agree to precommit to acting like a LDT agent for correlations that have already been created, but I don’t think that determines what kind of successor agent they would choose to create.
This is true if we mean something very specific by “causes”. CDT picks the action that would cause the highest number of paperclips to be created, if past predictions were uncorrelated with future events.
If an agent can arbitrarily modify its own source code (“precommit” in full generality), then we can model “the agent making choices over time” as “a series of agents that are constantly choosing which successor-agent follows them at the next time-step”. If Son-of-CDT were the same as LDT, this would be the same as saying that a self-modifying CDT agent will rewrite itself into an LDT agent, since nothing about CDT or LDT assigns special weight to actions that happen inside the agent’s brain vs. outside the agent’s brain.
Yeah, I was implicitly assuming that initiating a successor agent would force Omega to update its predictions about the new agent (and put the $1m in the box). As you say, that’s actually not very relevant, because it’s a property of a specific decision problem rather than CDT or son-of-CDT.