It isn’t formal verification in the same sense of the word but rather probabilistic verification if that makes sense?
You could then apply something like control theory methods to ensure that the expected divergence from the heuristic is less than a certain percentage in different places. In the limit it seems to me that this could be convergent towards formal verification proofs, it’s almost like swiss cheese style on the model level?
(Yes, this comment is a bit random with respect to the rest of the context but I find it an interesting question for control in terms of formal verification and it seemed like you might have some interesting takes here.)
You know what, I’ve identified a scenario where formal verification is both tractable and helps reduce AI risk, and the broad answer is making our codebases way more secure and controllable, assuming heavy AI automation of mathematics and coding is achieved (which I expect to happen before they can do everything else, as it’s a domain with strong feedback loops, easy verification against ground truth possible, and you can get very large amounts of data to continually improve.)
I’ve also discussed more about the value and limits of formal proofs in another comment below, but short answer, it’s probably really helpful in an infrastructure sense, but not so much as a means to make anything other than software and mathematics safe and formally specified (which would be a huge win if we could do that in itself), but we will not be able to prove stuff like a piece of software isn’t a threat to something else in the world entirely, and that also applies to biology in say determining whether a gene or virus will harm humans, mostly because we don’t have a path to quantify all possible higher-order behaviors a system doesn’t have:
My take on this is I’d be interested to see how the research goes, and there may be value in doing this approach, and I think that this may be a useful way to get a quantitative estimate/bound in the future, because it relaxes it’s goals.
I’d like to see what eventually happens for this research direction:
Could we reliably give heuristic arguments for neural networks when proofs failed, or is it too hard to provide relevant arguments?
I do want to say that on formal verification/proof itself, I think the most useful application is not proving non-trivial things, but rather to keep ourselves honest about the assumptions we are using.
When it comes to formal verification I’m curious what you think about the heuristic argument line of research that ARC are approaching?:
https://www.lesswrong.com/posts/QA3cmgNtNriMpxQgo/research-update-towards-a-law-of-iterated-expectations-for
It isn’t formal verification in the same sense of the word but rather probabilistic verification if that makes sense?
You could then apply something like control theory methods to ensure that the expected divergence from the heuristic is less than a certain percentage in different places. In the limit it seems to me that this could be convergent towards formal verification proofs, it’s almost like swiss cheese style on the model level?
(Yes, this comment is a bit random with respect to the rest of the context but I find it an interesting question for control in terms of formal verification and it seemed like you might have some interesting takes here.)
You know what, I’ve identified a scenario where formal verification is both tractable and helps reduce AI risk, and the broad answer is making our codebases way more secure and controllable, assuming heavy AI automation of mathematics and coding is achieved (which I expect to happen before they can do everything else, as it’s a domain with strong feedback loops, easy verification against ground truth possible, and you can get very large amounts of data to continually improve.)
Here are the links below:
https://www.lesswrong.com/posts/oJQnRDbgSS8i6DwNu/the-hopium-wars-the-agi-entente-delusion#sgH9iCyon55yQyDqF
I’ve also discussed more about the value and limits of formal proofs in another comment below, but short answer, it’s probably really helpful in an infrastructure sense, but not so much as a means to make anything other than software and mathematics safe and formally specified (which would be a huge win if we could do that in itself), but we will not be able to prove stuff like a piece of software isn’t a threat to something else in the world entirely, and that also applies to biology in say determining whether a gene or virus will harm humans, mostly because we don’t have a path to quantify all possible higher-order behaviors a system doesn’t have:
https://www.lesswrong.com/posts/oJQnRDbgSS8i6DwNu/the-hopium-wars-the-agi-entente-delusion#2GDjfZTJ8AZrh9i7Y
My take on this is I’d be interested to see how the research goes, and there may be value in doing this approach, and I think that this may be a useful way to get a quantitative estimate/bound in the future, because it relaxes it’s goals.
I’d like to see what eventually happens for this research direction:
Could we reliably give heuristic arguments for neural networks when proofs failed, or is it too hard to provide relevant arguments?
I do want to say that on formal verification/proof itself, I think the most useful application is not proving non-trivial things, but rather to keep ourselves honest about the assumptions we are using.