Is there a concept of a safe partially aligned AI? Where it recognizes its own limitations of understanding of the human[-ity] and limit its actions to what it knows is within those limits with high probability?
Is there a concept of a safe partially aligned AI? Where it recognizes its own limitations of understanding of the human[-ity] and limit its actions to what it knows is within those limits with high probability?