Why wouldnt something like optimize for your goals whilst ensuring that the risk of harming a human is below x percent?
How do we know that the AI has a correct and reasonable instrumentation of the risk of harming a human? What if the AI has an incorrect definition of human, or deliberately corrupts its definition of human?
Why wouldnt something like optimize for your goals whilst ensuring that the risk of harming a human is below x percent?
How do we know that the AI has a correct and reasonable instrumentation of the risk of harming a human? What if the AI has an incorrect definition of human, or deliberately corrupts its definition of human?