Scott Emmons comments on Image Hijacks: Adversarial Images can Control Generative Models at Runtime

Scott Emmons 22 Sep 2023 20:45 UTC
6 points
0
One needs to be very careful when proposing a heuristically-motivated adversarial defense. Otherwise, one runs the risk of being the next example in a paper like this one, which took a wide variety of heuristic defense papers and showed that stronger attacks completely break most of the defenses.
In fact, Section 5.2.2 completely breaks the defense from a previous paper that considered image cropping and rescaling, bit-depth reduction, and JPEG compression as baseline defenses, in addition to new defense methods called total variance minimization and image quilting. While the original defense paper claimed 60% robust top-1 accuracy, the attack paper totally circumvents the defense, dropping accuracy to 0%.

If compression was all you needed to solve adversarial robustness, then the adversarial robustness literature would already know this fact.
My current understanding is that an adversarial defense paper should measure certified robustness to ensure that no future attack will break the defense. This paper gives an example of certified robustness. However, I’m not a total expert in adversarial defense methodology, so I’d encourage anyone considering writing an adversarial defense paper to talk to such an expert.
- Tao Lin 22 Sep 2023 21:04 UTC
  3 points
  −1
  Parent
  I would bet money, maybe $2k, that I can create a robust system using a combination of all the image compression techniques I can conveniently find and a variety of ml models with self consistency that achieves >50% robust accuracy even after another year of attacks Edit: on inputs that don’t look obviously corrupted or mangled to an average human
  - Martin Randall 23 Sep 2023 11:42 UTC
    3 points
    0
    Parent
    In how many months?