I think you’re directionally correct; I agree about the following:
A critical part of formally verifying real-world systems involves coarse-graining uncountable state spaces into (sums of subsets of products of) finite state spaces.
I imagine these would be mostly if not entirely learned.
There is a tradeoff between computing time and bound tightness.
However, I think maybe my critical disagreement is that I do think probabilistic bounds can be guaranteed sound, with respect to an uncountable model, in finite time. (They just might not be tight enough to justify confidence in the proposed policy network, in which case the policy would not exit the box, and the failure is a flop rather than a foom.)
That being said—
I don’t expect existing model-checking methods to scale well. I think we will need to incorporate powerful AI heuristics into the search for a proof certificate, which may include various types of argument steps not limited to a monolithic coarse-graining (as mentioned in my footnote 2).
And I do think that relies on having a good meta-ontology or compositional world-modeling framework.
And I do think that is the hard part, actually!
At least, it is the part I endorse focusing on first.
If others follow your train of thought to narrow in on the conclusion that the compositional world-modeling framework problem, as Owen Lynch and I have laid it out in this post, is potentially “the hard part” of AI safety, that would be wonderful…
I think you’re directionally correct; I agree about the following:
A critical part of formally verifying real-world systems involves coarse-graining uncountable state spaces into (sums of subsets of products of) finite state spaces.
I imagine these would be mostly if not entirely learned.
There is a tradeoff between computing time and bound tightness.
However, I think maybe my critical disagreement is that I do think probabilistic bounds can be guaranteed sound, with respect to an uncountable model, in finite time. (They just might not be tight enough to justify confidence in the proposed policy network, in which case the policy would not exit the box, and the failure is a flop rather than a foom.)
Perhaps the keyphrase you’re missing is “interval MDP abstraction”. One specific paper that combines RL and model-checking and coarse-graining in the way you’re asking for is Formal Controller Synthesis for Continuous-Space MDPs via Model-Free Reinforcement Learning.
That being said— I don’t expect existing model-checking methods to scale well. I think we will need to incorporate powerful AI heuristics into the search for a proof certificate, which may include various types of argument steps not limited to a monolithic coarse-graining (as mentioned in my footnote 2). And I do think that relies on having a good meta-ontology or compositional world-modeling framework. And I do think that is the hard part, actually! At least, it is the part I endorse focusing on first. If others follow your train of thought to narrow in on the conclusion that the compositional world-modeling framework problem, as Owen Lynch and I have laid it out in this post, is potentially “the hard part” of AI safety, that would be wonderful…
Thanks, that makes a lot of sense to me. I have some technical questions about the post with Owen Lynch, but I’ll follow up elsewhere.