This solves nothing. If we knew the failure mode exactly, we could forbid it explicitly, rather than resort to some automatic self-destruct system. We, as humans, do not know exactly what the AI will do to become Unfriendly; that’s a key point to understand. Since we don’t know the failure mode, we can’t design a superstition to stop it, anymore than we can outright prohibit it.
This is, in fact, worse than explicit rules. It requires the AI to actively want to do something undesirable, instead of it occurring as a side effect.
This solves nothing. If we knew the failure mode exactly, we could forbid it explicitly, rather than resort to some automatic self-destruct system. We, as humans, do not know exactly what the AI will do to become Unfriendly; that’s a key point to understand. Since we don’t know the failure mode, we can’t design a superstition to stop it, anymore than we can outright prohibit it.
This is, in fact, worse than explicit rules. It requires the AI to actively want to do something undesirable, instead of it occurring as a side effect.