johnswentworth comments on A Shutdown Problem Proposal

johnswentworth 22 Jan 2024 6:10 UTC
3 points
0
I think one issue with the proposal is that the sub-agents need to continue operating in worlds where they believe in a logical contradiction… I think this is something I’m confused about for all agents and this proposal just brings it to the surface more than usual
+1 to this. For the benefit of readers: the “weirdness” here is common to CDT agents in general. In some sense they’re acting-as-though they believe in a do()-operated model, rather than their actual belief. Part of the answer is that the do()-op is actually part of the planning machinery, and part of the answer is Abram’s CDT=EDT thing, but I haven’t grokked the whole answer deeply enough yet to see how it carries over to this new use-case.
Definitely violates independence, because the combined machine should choose a lottery over <button-pressed> over certainty of either outcome.
Assuming I’m interpreting you correctly, this is non-obvious, because the lottery-choice will be one of many things the two agents negotiate over. So it could be that the negotiation shakes out to the certainty option, with some other considerations counterbalancing elsewhere in the negotiation.
More generally, insofar as the argument in Why Not Subagents? generalizes, the subagents should aggregate into an expected utility maximizer of some sort. But the probabilities of the resulting agent don’t necessarily match the epistemic probabilities of the original model—e.g. the agent’s probability on button state mostly reflects the relative bargaining power of the subagents rather than an epistemic state.