Perhaps, I don’t think they tried that (though I haven’t read the paper in detail).
If by distillation you mean “train a smaller student net using the current net”, I’d expect that you’d still have some robustness, but less of it. (But I’d expect removing 30 random neurons would still not make much of a difference, unless you distilled down to a really small model.)
I expect that after distillation, this robustness goes away? (“Perfection is achieved when there is nothing left to take away.”)
Perhaps, I don’t think they tried that (though I haven’t read the paper in detail).
If by distillation you mean “train a smaller student net using the current net”, I’d expect that you’d still have some robustness, but less of it. (But I’d expect removing 30 random neurons would still not make much of a difference, unless you distilled down to a really small model.)