As I understand it, the shutdown problem isn’t about making the AI correctly decide whether it ought to be shut down. We’d surely like to have an AI that always makes correct decisions, and if we succeed at that then we don’t need special logic about shutting down, we can just apply the general make-correct-decisions procedure and do whatever the correct thing is.
But the idea here is to have a simpler Plan B that will prevent the worst-case scenarios even if you make a mistake in the fully-general make-correct-decisions implementation, and it starts making incorrect decisions. The goal is to be able to shut it down anyway, even when the AI is not equipped to correctly reason out the pros and cons of shutting down.
As I understand it, the shutdown problem isn’t about making the AI correctly decide whether it ought to be shut down. We’d surely like to have an AI that always makes correct decisions, and if we succeed at that then we don’t need special logic about shutting down, we can just apply the general make-correct-decisions procedure and do whatever the correct thing is.
Yes, this outcome stems from the idea that if we can consistently enable an AI system to initiate a shutdown when it recognizes potential harm to its users—even at very worst scenarios, we may eventually move beyond the need for a precise ‘shutdown button / mechanism’ and instead aim for an advanced version that allows the AI to pause and present alternative options.
But the idea here is to have a simpler Plan B that will prevent the worst-case scenarios even if you make a mistake in the fully-general make-correct-decisions implementation, and it starts making incorrect decisions. The goal is to be able to shut it down anyway, even when the AI is not equipped to correctly reason out the pros and cons of shutting down.
I have experimented with numerous simpler scenarios and consistently arrived at the conclusion that AI should have the capability to willingly initiate a shutdown scenario—which is not simple. When we scale this to worst-case scenarios, we enter the same realm I am advocating for: building a mechanism that enables an understanding of all failure scenarios from the outset.
As I understand it, the shutdown problem isn’t about making the AI correctly decide whether it ought to be shut down. We’d surely like to have an AI that always makes correct decisions, and if we succeed at that then we don’t need special logic about shutting down, we can just apply the general make-correct-decisions procedure and do whatever the correct thing is.
But the idea here is to have a simpler Plan B that will prevent the worst-case scenarios even if you make a mistake in the fully-general make-correct-decisions implementation, and it starts making incorrect decisions. The goal is to be able to shut it down anyway, even when the AI is not equipped to correctly reason out the pros and cons of shutting down.
Yes, this outcome stems from the idea that if we can consistently enable an AI system to initiate a shutdown when it recognizes potential harm to its users—even at very worst scenarios, we may eventually move beyond the need for a precise ‘shutdown button / mechanism’ and instead aim for an advanced version that allows the AI to pause and present alternative options.
I have experimented with numerous simpler scenarios and consistently arrived at the conclusion that AI should have the capability to willingly initiate a shutdown scenario—which is not simple. When we scale this to worst-case scenarios, we enter the same realm I am advocating for: building a mechanism that enables an understanding of all failure scenarios from the outset.