If the ontological crisis is too severe, the AI may lose the ability to do anything at all, as the world becomes completely incomprehensible to it.
unless the AI is an entropy genie, it cannot influence utility values through its decisions, and will most likely become catatonic
Seems unwarrantedly optimistically anthropomorphic. A controlled shutdown in a case like this is a good outcome, but imagining a confused human spinning around and falling over does not make it so. The AI would exhibit undefined behavior, and hoping that this behavior is incoherent enough to be harmless or that it would drop an anvil on its own head seems unwarrantedly optimistic if that wasn’t an explicit design consideration. Obviously undefined behavior is implementation-dependent but I’d expect that in some cases you would see e.g. subsystems running coherently and perhaps effectively taking over behavior as high-level directions ceased to provide strong utility differentials. In other words, the AI built an automatic memory-managing subsystem inside itself that did some degree of consequentialism but in a way that was properly subservient to the overall preference function; now the overall preference function is trashed and the memory manager is what’s left to direct behavior. Some automatic system goes on trying to rewrite and improve code, and it gets advice from the memory manager but not from top-level preferences; thus the AI ends up as a memory-managing agent.
This probably doesn’t make sense the way I wrote it, but the general idea is that the parts of the AI could easily go on carrying out coherent behaviors, and that could easily end up somewhere coherent and unpleasant, if top-level consequentialism went incoherent. Unless controlled shutdown in that case had somehow been imposed from outside as a desirable conditional consequence, using a complexly structured utility function such that it would evaluate, “If my preferences are incoherent then I want to do a quiet, harmless shutdown, and that doesn’t mean optimize the universe for maximal quietness and harmlessness either.” Ordinarily, an agent would evaluate, “If my utility function goes incoherent… then I must not want anything in particular, including a controlled shutdown of my code.”
Seems unwarrantedly optimistically anthropomorphic. A controlled shutdown in a case like this is a good outcome, but imagining a confused human spinning around and falling over does not make it so. The AI would exhibit undefined behavior, and hoping that this behavior is incoherent enough to be harmless or that it would drop an anvil on its own head seems unwarrantedly optimistic if that wasn’t an explicit design consideration. Obviously undefined behavior is implementation-dependent but I’d expect that in some cases you would see e.g. subsystems running coherently and perhaps effectively taking over behavior as high-level directions ceased to provide strong utility differentials. In other words, the AI built an automatic memory-managing subsystem inside itself that did some degree of consequentialism but in a way that was properly subservient to the overall preference function; now the overall preference function is trashed and the memory manager is what’s left to direct behavior. Some automatic system goes on trying to rewrite and improve code, and it gets advice from the memory manager but not from top-level preferences; thus the AI ends up as a memory-managing agent.
This probably doesn’t make sense the way I wrote it, but the general idea is that the parts of the AI could easily go on carrying out coherent behaviors, and that could easily end up somewhere coherent and unpleasant, if top-level consequentialism went incoherent. Unless controlled shutdown in that case had somehow been imposed from outside as a desirable conditional consequence, using a complexly structured utility function such that it would evaluate, “If my preferences are incoherent then I want to do a quiet, harmless shutdown, and that doesn’t mean optimize the universe for maximal quietness and harmlessness either.” Ordinarily, an agent would evaluate, “If my utility function goes incoherent… then I must not want anything in particular, including a controlled shutdown of my code.”