Amusing tid-bit, maybe to keep in mind when writing for an ML audience: The connotations with the term “adversarial examples” or “adversarial training” run deep :-)
I engaged with the paper and related blog posts for a couple of hours. It took really long until my brain accepted that “adversarial examples” here doesn’t mean the thing that it usually means when I encounter the term (i.e. “small” changes to an input that change the classification, for some definition of small).
There were several instances when my brain went “Wait, that’s not how adversarial examples work”, followed by short confusion, followed by “right, that’s because my cached concept of X is only true for “adversarial examples as commonly defined in ML”, not for “adversarial examples as defined here”.
This comes from the fact that you assumed “adversarial example” had a more specific definition than it really does (from reading ML literature), right? Note that the alignment forum definition of “adversarial example” has the misclassified panda as an example.
Amusing tid-bit, maybe to keep in mind when writing for an ML audience: The connotations with the term “adversarial examples” or “adversarial training” run deep :-)
I engaged with the paper and related blog posts for a couple of hours. It took really long until my brain accepted that “adversarial examples” here doesn’t mean the thing that it usually means when I encounter the term (i.e. “small” changes to an input that change the classification, for some definition of small).
There were several instances when my brain went “Wait, that’s not how adversarial examples work”, followed by short confusion, followed by “right, that’s because my cached concept of X is only true for “adversarial examples as commonly defined in ML”, not for “adversarial examples as defined here”.
This comes from the fact that you assumed “adversarial example” had a more specific definition than it really does (from reading ML literature), right? Note that the alignment forum definition of “adversarial example” has the misclassified panda as an example.