Testing with respect to learned models sounds great, and I expect there’s lots of interesting GAN-like work to be done in online adversarial test generation.
IMO there are usefully testable safety invariants too, but mostly at the implementation level rather than system behaviour—for example “every number in this layer should always be finite”. It’s not the case that this implies safety, but a violation implies that the system is not behaving as expected and therefore may be unsafe.
Testing with respect to learned models sounds great, and I expect there’s lots of interesting GAN-like work to be done in online adversarial test generation.
IMO there are usefully testable safety invariants too, but mostly at the implementation level rather than system behaviour—for example “every number in this layer should always be finite”. It’s not the case that this implies safety, but a violation implies that the system is not behaving as expected and therefore may be unsafe.