Generally it seems fair to say that adversarial robustness is significantly more challenging than the non adversarial case and it does not simply go away on its own with scale
I don’t think we know that. (How big is KataGo anyway, 0.01b parameters or so?) We don’t have much scaling research on adversarial robustness, what we do have suggests that adversarial robustness does increase, the isoperimetry theory claims that scaling much larger than we currently do will be sufficient (and may be necessary), and the fact that a staggeringly large adversarial-defense literature has yet to yield any defense that holds up longer than a year or two before an attack cracks it & gets added to Clever Hans suggests that the goal of adversarial defenses for small NNs may be inherently impossible (and there is a certain academic smell to adversarial research which it shares with other areas that either have been best solved by scaling, or, like continual learning, look increasingly like they are going to be soon).
I don’t think it’s fair to compare parameter sizes between language models and models for other domains, such as games or vision. E.g., I believe AlphaZero is also only in the range of hundreds of millions of parameters? (quick google didn’t give me the answer)
I think there is a real difference between adversarial and natural distribution shifts, and without adversarial training, even large network struggle with adversarial shifts. So I don’t think this is a problem that would go away with scale alone. At least I don’t see evidence for it from current data (failure of defenses for small models is no evidence of success of size alone for larger ones).
One way to see this is to look at the figures in this plotting playground of “accuracy on the line”. This is the figure for natural distribution shift—the green models are the ones that are trained with more data, and they do seem to be “above the curve” (significantly so for CLIP, which are the two green dots reaching ~ 53 and ~55 natural distribution accuracy compared to ~60 and ~63 vanilla accuracy
In contrast, if you look at adversarial perturbations, then you can see that actual adversarial training (bright orange) or other robustness interactions (brown) is much more effective than more data (green) which in fact mostly underperform.
(I know you focused on “more model” but I think to first approximation “more model” and “more data” should have similar effects.)
I don’t think we know that. (How big is KataGo anyway, 0.01b parameters or so?) We don’t have much scaling research on adversarial robustness, what we do have suggests that adversarial robustness does increase, the isoperimetry theory claims that scaling much larger than we currently do will be sufficient (and may be necessary), and the fact that a staggeringly large adversarial-defense literature has yet to yield any defense that holds up longer than a year or two before an attack cracks it & gets added to Clever Hans suggests that the goal of adversarial defenses for small NNs may be inherently impossible (and there is a certain academic smell to adversarial research which it shares with other areas that either have been best solved by scaling, or, like continual learning, look increasingly like they are going to be soon).
I don’t think it’s fair to compare parameter sizes between language models and models for other domains, such as games or vision. E.g., I believe AlphaZero is also only in the range of hundreds of millions of parameters? (quick google didn’t give me the answer)
I think there is a real difference between adversarial and natural distribution shifts, and without adversarial training, even large network struggle with adversarial shifts. So I don’t think this is a problem that would go away with scale alone. At least I don’t see evidence for it from current data (failure of defenses for small models is no evidence of success of size alone for larger ones).
One way to see this is to look at the figures in this plotting playground of “accuracy on the line”. This is the figure for natural distribution shift—the green models are the ones that are trained with more data, and they do seem to be “above the curve” (significantly so for CLIP, which are the two green dots reaching ~ 53 and ~55 natural distribution accuracy compared to ~60 and ~63 vanilla accuracy
In contrast, if you look at adversarial perturbations, then you can see that actual adversarial training (bright orange) or other robustness interactions (brown) is much more effective than more data (green) which in fact mostly underperform.
(I know you focused on “more model” but I think to first approximation “more model” and “more data” should have similar effects.)