How do we know that the AI has a correct and reasonable instrumentation of the risk of harming a human? What if the AI has an incorrect definition of human, or deliberately corrupts its definition of human?
How do we know that the AI has a correct and reasonable instrumentation of the risk of harming a human? What if the AI has an incorrect definition of human, or deliberately corrupts its definition of human?