High Impact Careers in Formal Verification: Artificial Intelligence
My research focuses on advanced testing and fuzzing tools, which are so much easier to use that people actually use them—eg in Pytorch, and I understand in Deepmind. If people seem interested I could write up a post on relevance to AI safety in a few weeks.
Core idea: even without proofs, writing out safety properties or other system invariants in code is valuable both (a) for deconfusion, and (b) because we can have a computer search for counterexamples using a variety of heuristics and feedbacks. At the current margin this tends to improve team productivity and shift ML culture towards valuing specifications, which may be a good thing for AI x-risk.
My general take is that anything relying on safety properties / system invariants being written out in formal languages / interpretable code seems not that helpful, because we just don’t get such clean safety properties / system invariants.
I am pretty enthusiastic about relying on learned models of safety properties / system invariants, that are assumed to be good but not perfect specifications; and fuzzing / testing with respect to those learned models seems great.
Testing with respect to learned models sounds great, and I expect there’s lots of interesting GAN-like work to be done in online adversarial test generation.
IMO there are usefully testable safety invariants too, but mostly at the implementation level rather than system behaviour—for example “every number in this layer should always be finite”. It’s not the case that this implies safety, but a violation implies that the system is not behaving as expected and therefore may be unsafe.
My research focuses on advanced testing and fuzzing tools, which are so much easier to use that people actually use them—eg in Pytorch, and I understand in Deepmind. If people seem interested I could write up a post on relevance to AI safety in a few weeks.
Core idea: even without proofs, writing out safety properties or other system invariants in code is valuable both (a) for deconfusion, and (b) because we can have a computer search for counterexamples using a variety of heuristics and feedbacks. At the current margin this tends to improve team productivity and shift ML culture towards valuing specifications, which may be a good thing for AI x-risk.
My general take is that anything relying on safety properties / system invariants being written out in formal languages / interpretable code seems not that helpful, because we just don’t get such clean safety properties / system invariants.
I am pretty enthusiastic about relying on learned models of safety properties / system invariants, that are assumed to be good but not perfect specifications; and fuzzing / testing with respect to those learned models seems great.
Testing with respect to learned models sounds great, and I expect there’s lots of interesting GAN-like work to be done in online adversarial test generation.
IMO there are usefully testable safety invariants too, but mostly at the implementation level rather than system behaviour—for example “every number in this layer should always be finite”. It’s not the case that this implies safety, but a violation implies that the system is not behaving as expected and therefore may be unsafe.