davidad comments on A list of core AI safety problems and how I hope to solve them

davidad 28 Aug 2023 0:39 UTC
LW: 10 AF: 6
0
AF
I think you’re directionally correct; I agree about the following:
- A critical part of formally verifying real-world systems involves coarse-graining uncountable state spaces into (sums of subsets of products of) finite state spaces.
- I imagine these would be mostly if not entirely learned.
- There is a tradeoff between computing time and bound tightness.
However, I think maybe my critical disagreement is that I do think probabilistic bounds can be guaranteed sound, with respect to an uncountable model, in finite time. (They just might not be tight enough to justify confidence in the proposed policy network, in which case the policy would not exit the box, and the failure is a flop rather than a foom.)

Perhaps the keyphrase you’re missing is “interval MDP abstraction”. One specific paper that combines RL and model-checking and coarse-graining in the way you’re asking for is Formal Controller Synthesis for Continuous-Space MDPs via Model-Free Reinforcement Learning.
- davidad 28 Aug 2023 0:49 UTC
  LW: 5 AF: 3
  0
  AF Parent
  That being said— I don’t expect existing model-checking methods to scale well. I think we will need to incorporate powerful AI heuristics into the search for a proof certificate, which may include various types of argument steps not limited to a monolithic coarse-graining (as mentioned in my footnote 2). And I do think that relies on having a good meta-ontology or compositional world-modeling framework. And I do think that is the hard part, actually! At least, it is the part I endorse focusing on first. If others follow your train of thought to narrow in on the conclusion that the compositional world-modeling framework problem, as Owen Lynch and I have laid it out in this post, is potentially “the hard part” of AI safety, that would be wonderful…
  - Daniel Murfet 28 Aug 2023 12:58 UTC
    3 points
    2
    Parent
    Thanks, that makes a lot of sense to me. I have some technical questions about the post with Owen Lynch, but I’ll follow up elsewhere.