Take an EfficientNet model with >= 99% accuracy on MNIST digit classification. What is the largest possible change in the probability assigned to some class between two images, which differ only in the least significant bit of a single pixel? Prove your answer before 2023.
You aren’t counting the fact that you can pretty easily bound this based on the fact that image models are Lipschitz, right? Like, you can just ignore the ReLUs and you’ll get an upper bound by looking at the weight matrices. And I believe there are techniques that let you get tighter bounds than this.
If you can produce a checkable proof of this over the actual EfficientNet architecture, I’d pay out the prize. Note that this uses floating-point numbers, not the reals!
I’m confused because I understand you to be asking for a bound on the derivative of an EfficientNet model, but it seems quite easy (though perhaps kind of a hassle) to get this bound.
I don’t think the floating point numbers matter very much (especially if you’re ok with the bound being computed a bit more loosely).
Ah, crux: I do think the floating-point matters! Issues of precision, underflow, overflow, and NaNs bedevil model training and occasionally deployment-time behavior. By analogy, if we deploy an AGI the ideal mathematical form of which is aligned we may still be doomed, even it’s plausibly our best option in expectation.
Checkable meaning that I or someone I trust with this has to be able to check it! Maxwell’s proposal is simple enough that I can reason through the whole thing, even over float32 rather than R, but for more complex arguments I’d probably want it machine-checkable for at least the tricky numeric parts.
Am I correct that you wouldn’t find a bound acceptable, you specifically want the exact maximum?
You aren’t counting the fact that you can pretty easily bound this based on the fact that image models are Lipschitz, right? Like, you can just ignore the ReLUs and you’ll get an upper bound by looking at the weight matrices. And I believe there are techniques that let you get tighter bounds than this.
If you can produce a checkable proof of this over the actual EfficientNet architecture, I’d pay out the prize. Note that this uses floating-point numbers, not the reals!
Would running the method in this paper on EfficientNet count?
What if we instead used a weaker but still sound method (e.g. based on linear programs instead of semidefinite programs)?
On a quick skim it looks like that fails both “not equivalent to executing the model” and the
float32
vs R problem.It’s a nice approach, but I’d also be surprised if it scales to maintain tight bounds on much larger networks.
By “checkable” do you mean “machine checkable”?
I’m confused because I understand you to be asking for a bound on the derivative of an EfficientNet model, but it seems quite easy (though perhaps kind of a hassle) to get this bound.
I don’t think the floating point numbers matter very much (especially if you’re ok with the bound being computed a bit more loosely).
Ah, crux: I do think the floating-point matters! Issues of precision, underflow, overflow, and NaNs bedevil model training and occasionally deployment-time behavior. By analogy, if we deploy an AGI the ideal mathematical form of which is aligned we may still be doomed, even it’s plausibly our best option in expectation.
Checkable meaning that I or someone I trust with this has to be able to check it! Maxwell’s proposal is simple enough that I can reason through the whole thing, even over
float32
rather than R, but for more complex arguments I’d probably want it machine-checkable for at least the tricky numeric parts.I’d award half the prize for a non-trivial bound.