It’s a bit worse than that. The “hope” seems to be more along the lines of:
“The existence of sufficient coherence is not certain; if it is not present, the system implementing CEV should execute a “controlled shutdown” rather than behaving unpredictably.”—Nick Tarleton, “Coherent Extrapolated Volition: A Meta-Level Approach to Machine Ethics”
Nevermind how a nascent valueless AI is supposed to convince itself to go back into the box.
It’s a bit worse than that. The “hope” seems to be more along the lines of:
Nevermind how a nascent valueless AI is supposed to convince itself to go back into the box.