Yes. We’re just aiming for a distillation of the overseer’s judgments, but that’s what existing imagenet models are anyway, so we’ll be vulnerable to adversarial examples for the same reason.
Current theme: default
Less Wrong (text)
Less Wrong (link)
Yes. We’re just aiming for a distillation of the overseer’s judgments, but that’s what existing imagenet models are anyway, so we’ll be vulnerable to adversarial examples for the same reason.