If you include these sorts of things as guarantees, then I think the training process “by default” won’t provide enough guarantees, but we might be able to get it to provide enough guarantees, e.g. by adversarial training.
I do include those sorts of things as guarantees. I do think it’s possible that adversarial training will provide such guarantees, but I think it’s difficult for the reasons that I’ve mentioned, and that a sufficient adversary will itself need to have a good deal of transparency into the system in order to come up with cases where the system will fail.
Background beliefs
I do include those sorts of things as guarantees. I do think it’s possible that adversarial training will provide such guarantees, but I think it’s difficult for the reasons that I’ve mentioned, and that a sufficient adversary will itself need to have a good deal of transparency into the system in order to come up with cases where the system will fail.