VojtaKovarik comments on AI Safety in a World of Vulnerable Machine Learning Systems

VojtaKovarik 24 Jul 2023 15:07 UTC
LW: 3 AF: 2
0
AF
For the purpose of this section, we will consider adversarial robustness to be solved if systems cannot be practically exploited to cause catastrophic outcomes.
Regarding the predictions, I want to make the following quibble: According to the definition above, one way of “solving” adversarial robustness is to make sure that nobody tries to catastrophically exploit the system in the first place. (In particular, exploitable AI that takes over the world is no longer exploitable.)
So, a lot with this definition rests on how do you distinguish between “cannot be exploited” and “will not be exploited”.
And on reflection, I think that for some people, this is close to being a crux regarding the importance of all this research.