Steve_Omohundro comments on Limitations on Formal Verification for AI Safety

Steve_Omohundro 20 Aug 2024 21:13 UTC
7 points
2
Simulation of the time evolution of models from their dynamical equations is only one way of proving properties about them. For example, a harmonic oscillator https://en.wikipedia.org/wiki/Harmonic_oscillator has dynamical equations m d^2x/dt^2= -kx. You can simulate that but you can also prove that the kinetic plus potential energy is conserved and get limits on its behavior arbitrarily far into the future.
- Seth Herd 20 Aug 2024 22:11 UTC
  6 points
  0
  Parent
  Sure but seems highly unlikely there are any such neat simplifications for complex cognitive systems built from neural networks.
  
  Other than “sapient beings do things that further their goals in their best estimation”, which is a rough predictor, and what we’re already trying to focus on. But the devil is in the details, and the important question is about how the goal is represented and understood.
  - Steve_Omohundro 20 Aug 2024 22:41 UTC
    11 points
    2
    Parent
    Oh yeah, by their very nature it’s likely to be hard to predict intelligent systems behavior in detail. We can put constraints on them, though, and prove that they operate within those constraints.
    Even simple systems like random SAT problems https://en.wikipedia.org/wiki/SAT_solver can have a very rich statistical structure. And the behavior of the solvers can be quite unpredictable.
    In some sense, this is the source of unpredictability of cryptographic hash functions. Odet Goldreich proposed an unbelivable simple boolean function which is believed to be one-way: https://link.springer.com/chapter/10.1007/978-3-642-22670-0_10
    On the other hand, I think it is often possible to distill behavior for a particlular task from a rich intelligence into simple code with provable properties.