Add a term granting a large disutility for deaths, and this should do the trick.
What if death isn’t well-defined? What if the AI has the option of cryonically freezing a person to save their life—but then being frozen, that person does not have any “current” utility function, so the AI can then disregard them completely. Situations like this also demonstrate that more generally, trying to satisfy someone’s utility function may have an unavoidable side-effect of changing their utility function. These side-effects may be complex enough that the person does not forsee them, and it is not possible for the AI to explain them to the person.
It’s simple, it’s well defined—it just doesn’t work. Or at least, work naively the way I was hoping.
The original version of the hack—on one-shot oracle machines—worked reasonably well. This version needs more work. And I shouldn’t have mentioned deaths here; that whole subject requires its own seperate treatment.
What if death isn’t well-defined? What if the AI has the option of cryonically freezing a person to save their life—but then being frozen, that person does not have any “current” utility function, so the AI can then disregard them completely. Situations like this also demonstrate that more generally, trying to satisfy someone’s utility function may have an unavoidable side-effect of changing their utility function. These side-effects may be complex enough that the person does not forsee them, and it is not possible for the AI to explain them to the person.
I think your “simple hack” is not actually that simple or well-defined.
It’s simple, it’s well defined—it just doesn’t work. Or at least, work naively the way I was hoping.
The original version of the hack—on one-shot oracle machines—worked reasonably well. This version needs more work. And I shouldn’t have mentioned deaths here; that whole subject requires its own seperate treatment.