It might observe that its goals seem easier to achieve in cities where the magic building is still present,
I think you just answered your own question. Indeed, if the agent found that destroying its instances does not lead to less of its goals being achieved, then even a “naturalized” reasoner should not particularly care about destroying itself entirely.
Now, you say the agent would treat instances of itself the same way it would treat an ally. There’s a difference: An ally is someone who behaves in ways that benefit it, while an instance is something whose actions correlate with its output signal. The fact that it has a fine-grained control over instances of itself should lead it to treat itself differently from allies. But if the agent has an ally that completely reliably transmits to it true information and performs its requests, then yes, the agent should that ally the same way it treats parts of itself.
I think you just answered your own question. Indeed, if the agent found that destroying its instances does not lead to less of its goals being achieved, then even a “naturalized” reasoner should not particularly care about destroying itself entirely.
You can’t win, Vader. If you strike me down, I shall become more powerful than you can possibly imagine.
I think you just answered your own question. Indeed, if the agent found that destroying its instances does not lead to less of its goals being achieved, then even a “naturalized” reasoner should not particularly care about destroying itself entirely.
Now, you say the agent would treat instances of itself the same way it would treat an ally. There’s a difference: An ally is someone who behaves in ways that benefit it, while an instance is something whose actions correlate with its output signal. The fact that it has a fine-grained control over instances of itself should lead it to treat itself differently from allies. But if the agent has an ally that completely reliably transmits to it true information and performs its requests, then yes, the agent should that ally the same way it treats parts of itself.
You can’t win, Vader. If you strike me down, I shall become more powerful than you can possibly imagine.