The point about agents in environment suggests that shutdown is not really about corrigibility, it’s more of a test case (application) for corrigibility. If an agent can trivially create other agents in environment, and poses a risk of actually doing so, then shutting it down doesn’t resolve that risk, so you’d need to take care of the more general problem first. Not creating agents in environment seems closer to the soft optimization side of corrigibility: preferring less dangerous cognition, not being an optimizer. The agent not contesting or assisting shutdown is still useful when the risk is not there or doesn’t actually trigger, but it’s not necessarily related to corrigibility, other than in the sense of being an important task for corrigible agents to support.
The point about agents in environment suggests that shutdown is not really about corrigibility, it’s more of a test case (application) for corrigibility. If an agent can trivially create other agents in environment, and poses a risk of actually doing so, then shutting it down doesn’t resolve that risk, so you’d need to take care of the more general problem first. Not creating agents in environment seems closer to the soft optimization side of corrigibility: preferring less dangerous cognition, not being an optimizer. The agent not contesting or assisting shutdown is still useful when the risk is not there or doesn’t actually trigger, but it’s not necessarily related to corrigibility, other than in the sense of being an important task for corrigible agents to support.