Steve_Omohundro comments on Provably Safe AI: Worldview and Projects

Steve_Omohundro 12 Aug 2024 4:40 UTC
3 points
−3
Intervals are often a great simple form of “enclosure” in continuous domains. For simple functions there is also “interval arithmetic” which cheaply produces a bounding interval on the output of a function given intervals on its inputs: https://en.wikipedia.org/wiki/Interval_arithmetic But, as you say, for complex functions it can blow up. For a simple example of why, consider the function “f(x)=x-x” evaluated on the input interval [0,1]. In the simplest interval arithmetic, the interval for subtraction has to bound the worst possible members of the intervals of its two inputs. In this case that would be a lower bound of “0-1″ and an upper bound of “1-0” producing the resulting interval: [-1,1]. But, of course, “x-x” is always 0, so this is huge over-approximation. People have developed all kinds of techniques for capturing the correlations between variables in evaluating circuits on intervals. And you can always shrink the error by splitting the input intervals and doing “branch and bound”. But all of those are just particular implementation choices in proving bounds on the output of the function. Advanced AI theorem provers (like AlphaProof) can use very sophisticated techniques to accurately get the true bound on the output of a function.
But, it may be that it’s not a fruitful approach to try to bound the behavior of complex neural nets such as transformers. In our approach, we don’t need to understand or constrain a complex AI generating a solution or a control policy. Rather, we require the AI to generate a program, control policy, or simple network for taking actions in the situation of interest. And we force it to generate a proof that it satisfies given safety requirements. If it can’t do that, then it has no business taking actions in a dangerous setting.
- ryan_greenblatt 12 Aug 2024 6:07 UTC
  12 points
  14
  Parent
  Rather, we require the AI to generate a program, control policy, or simple network for taking actions in the situation of interest. And we force it to generate a proof that it satisfies given safety requirements. If it can’t do that, then it has no business taking actions in a dangerous setting.
  
  This seems near certain to be cripplingly uncompetitive^[1] even with massive effort on improving verification.
  ↩︎
  If applied to all potentialy dangerous applications.
- ryan_greenblatt 12 Aug 2024 6:02 UTC
  11 points
  7
  Parent
  I agree you can do better than naive interval propagation by taking into account correlations. However, it will be tricky to get a much better bound while avoiding having this balloon in time complexity (all possible correlations requires exponentional time).
  
  More strongly, I think that if an adversary controlled the non-determinism (e.g. summation order) in current efficient inference setups, they would actually be able to strongly influence the AI to an actualy dangerous extent—we are likely to depend on this non-determinism being non-adversarial (which is a reasonable assumption to make).
  
  (And you can’t prove a false statement...)
  
  See also heuristic arguments which try to resolve this sort of issue by assuming a lack of structure.