ryan_greenblatt comments on Provably Safe AI: Worldview and Projects

ryan_greenblatt 12 Aug 2024 6:02 UTC
11 points
7
I agree you can do better than naive interval propagation by taking into account correlations. However, it will be tricky to get a much better bound while avoiding having this balloon in time complexity (all possible correlations requires exponentional time).

More strongly, I think that if an adversary controlled the non-determinism (e.g. summation order) in current efficient inference setups, they would actually be able to strongly influence the AI to an actualy dangerous extent—we are likely to depend on this non-determinism being non-adversarial (which is a reasonable assumption to make).

(And you can’t prove a false statement...)

See also heuristic arguments which try to resolve this sort of issue by assuming a lack of structure.