This is some evidence against the “scaling hypothesis”, i.e. evidence that something non-trivial and important is missing from modern deep learning in order to reach AGI.
The usual response is just “you don’t actually need to be robust to white box advexes, and only somewhat resistant to black box advexes, to take over the world”
My point is not that there is a direct link between adversarial robustness and taking over the world, but that the lack of adversarial robustness is (inconclusive) evidence that deep learning is qualitatively worse than human intelligence in some way (which would also manifest in ways other than adversarial examples). If the latter is true, it certainly reduces the potential risk from such systems (maybe not to 0, but it certainly substantially weakens the case for the more dramatic take-over scenarios).
The usual response is just “you don’t actually need to be robust to white box advexes, and only somewhat resistant to black box advexes, to take over the world”
My point is not that there is a direct link between adversarial robustness and taking over the world, but that the lack of adversarial robustness is (inconclusive) evidence that deep learning is qualitatively worse than human intelligence in some way (which would also manifest in ways other than adversarial examples). If the latter is true, it certainly reduces the potential risk from such systems (maybe not to 0, but it certainly substantially weakens the case for the more dramatic take-over scenarios).