My general take is that anything relying on safety properties / system invariants being written out in formal languages / interpretable code seems not that helpful, because we just don’t get such clean safety properties / system invariants.
I am pretty enthusiastic about relying on learned models of safety properties / system invariants, that are assumed to be good but not perfect specifications; and fuzzing / testing with respect to those learned models seems great.
Testing with respect to learned models sounds great, and I expect there’s lots of interesting GAN-like work to be done in online adversarial test generation.
IMO there are usefully testable safety invariants too, but mostly at the implementation level rather than system behaviour—for example “every number in this layer should always be finite”. It’s not the case that this implies safety, but a violation implies that the system is not behaving as expected and therefore may be unsafe.
My general take is that anything relying on safety properties / system invariants being written out in formal languages / interpretable code seems not that helpful, because we just don’t get such clean safety properties / system invariants.
I am pretty enthusiastic about relying on learned models of safety properties / system invariants, that are assumed to be good but not perfect specifications; and fuzzing / testing with respect to those learned models seems great.
Testing with respect to learned models sounds great, and I expect there’s lots of interesting GAN-like work to be done in online adversarial test generation.
IMO there are usefully testable safety invariants too, but mostly at the implementation level rather than system behaviour—for example “every number in this layer should always be finite”. It’s not the case that this implies safety, but a violation implies that the system is not behaving as expected and therefore may be unsafe.