Since my model is more accurate, ~10 times out of 11 the input will correspond to an “adversarial” attack on your model.
This argument (or the uncorrelation assumption) proves too much. A perfect cat detector performs better than one that also calls close-ups of the sun cats. Yet close-ups of the sun do not qualify as adversarial examples, as they are far from any likely starting image.
I’m not sure what you’re trying to say. I’m using a broad definition of “adversarial example” where it’s an adversarial example if you can deliberately make a classifier misclassify. So if an adversary can find a close-up of a sun that your cat detector calls a cat, that’s an adversarial example by my definition. I think this is similar to the definition Ian Goodfellow uses.
Edit:
An adversarial example is an input to a machine learning model that is intentionally designed by an attacker to fool the model into producing an incorrect output.
This argument (or the uncorrelation assumption) proves too much. A perfect cat detector performs better than one that also calls close-ups of the sun cats. Yet close-ups of the sun do not qualify as adversarial examples, as they are far from any likely starting image.
I’m not sure what you’re trying to say. I’m using a broad definition of “adversarial example” where it’s an adversarial example if you can deliberately make a classifier misclassify. So if an adversary can find a close-up of a sun that your cat detector calls a cat, that’s an adversarial example by my definition. I think this is similar to the definition Ian Goodfellow uses.
Edit:
Source