I don’t think “inner alignment” is applicable here.
If the clone behaves indistinguishably from the human it is based on, then there is simply nothing more to say. It doesn’t matter what is going on inside.
If the clone behaves indistinguishably from the human it is based on, then there is simply nothing more to say. It doesn’t matter what is going on inside.
Right, I agree on that. The problem is, “behaves indistinguishably” for how long? You can’t guarantee whether it will stop acting that way in the future, which is what is predicted by deceptive alignment.
I don’t think “inner alignment” is applicable here.
If the clone behaves indistinguishably from the human it is based on, then there is simply nothing more to say. It doesn’t matter what is going on inside.
Right, I agree on that. The problem is, “behaves indistinguishably” for how long? You can’t guarantee whether it will stop acting that way in the future, which is what is predicted by deceptive alignment.