This scenario requires a pretty specific (but likely) circumstances
No time limit on task
No other AIs that would prevent it from power grabbing or otherwise being an obstacle to their goals
AI assuming that goal will not be reached even after AI is shutdown (by other AIs, by same AI after being turned back on, by people, by chance, as the eventual result of AI’s actions before being shut down, etc)
Extremely specific value function that ignores everything except one specific goal
This goal being a core goal, not an instrumental. For example, final goal could be “be aligned”, instrumental goal—“do what people asks, because that’s what aligned AIs do”. Then the order to stop would not be a change of the core goal, but a new data about the world, that updates the best strategy of reaching the core goal.
This scenario requires a pretty specific (but likely) circumstances
No time limit on task
No other AIs that would prevent it from power grabbing or otherwise being an obstacle to their goals
AI assuming that goal will not be reached even after AI is shutdown (by other AIs, by same AI after being turned back on, by people, by chance, as the eventual result of AI’s actions before being shut down, etc)
Extremely specific value function that ignores everything except one specific goal
This goal being a core goal, not an instrumental. For example, final goal could be “be aligned”, instrumental goal—“do what people asks, because that’s what aligned AIs do”. Then the order to stop would not be a change of the core goal, but a new data about the world, that updates the best strategy of reaching the core goal.