Perhaps, I don’t think they tried that (though I haven’t read the paper in detail).
If by distillation you mean “train a smaller student net using the current net”, I’d expect that you’d still have some robustness, but less of it. (But I’d expect removing 30 random neurons would still not make much of a difference, unless you distilled down to a really small model.)
Perhaps, I don’t think they tried that (though I haven’t read the paper in detail).
If by distillation you mean “train a smaller student net using the current net”, I’d expect that you’d still have some robustness, but less of it. (But I’d expect removing 30 random neurons would still not make much of a difference, unless you distilled down to a really small model.)