It’s unclear what shutdown means, and this issue seems related to what it means to interfere with pressing of a shutdown button. From the point of view of almost any formal decision making framework, creating new separate agents in the environment can’t be distinguished from any other activity. The main agent getting shut down doesn’t automatically affect these environmental agents in appropriate ways to cease exerting influence on the world.
A corrigible agent needs to bound its influence, and to avoid actions that escalate the risk of breaking the bounds it sets on its influence. At the same time, when some influence leaks through, it shouldn’t be fought with greater counter-influence. The frame of expected utility maximization seems entirely unsuited for deconfusing this problem.
Yes, ensuring that the agent creates corrigible subagents is another difficulty on top of the difficulties that I explain in this post. I tried to solve that problem in section 14 on p.51 here.
It’s unclear what shutdown means, and this issue seems related to what it means to interfere with pressing of a shutdown button. From the point of view of almost any formal decision making framework, creating new separate agents in the environment can’t be distinguished from any other activity. The main agent getting shut down doesn’t automatically affect these environmental agents in appropriate ways to cease exerting influence on the world.
A corrigible agent needs to bound its influence, and to avoid actions that escalate the risk of breaking the bounds it sets on its influence. At the same time, when some influence leaks through, it shouldn’t be fought with greater counter-influence. The frame of expected utility maximization seems entirely unsuited for deconfusing this problem.
Yes, ensuring that the agent creates corrigible subagents is another difficulty on top of the difficulties that I explain in this post. I tried to solve that problem in section 14 on p.51 here.