davidad comments on Proveably Safe Self Driving Cars [Modulo Assumptions]

davidad 4 Oct 2024 14:18 UTC
5 points
7
Yes, I am indeed thinking about this.
1. Let’s first consider the easiest case, in which we ask for a solution/output to be not merely provably correct according to some spec, but provably unique. Then there is clearly no space for steganography.
2. It is a little bit harder if the solution has some symmetries, like permutations of the order in which things are serialized. For this we can use sorting and other normalization techniques (converting an output to a unique representative of its equivalence class).
3. If normalization is hard, we can set up the box to be interactive, so that users cannot see the solution, but can only run queries on it (e.g. “next action please”) and the user must prove that the queries are invariant under (aka “respect”) the equivalence relation.
4. But for most real-world problems there are many equally specification-satisfying strategies that are not behaviourally equivalent. The next hardest thing to do is to establish an upper bound on the log-cardinality of the set of behavioural-equivalence-classes that satisfy the specification. We want to keep the channel capacity (in bits per day) available for steganography, to humans or insecure cyber-infrastructure or biosequence synthesis, to a very low threshold (maybe 2 bits per day is okay).
5. Often there will be too many degrees of freedom for this, so instead we need to do entropy regularization, which can be done elegantly with GFlowNets. I have not worked out all the details here but the hope is that we can derive a sound bound or PAC estimate of the KL divergence of the observable behaviour (which is sampled from the policy distribution dependent upon a stream of certified randomness) from the Frechet mean/centroid of all correct behaviours. This quantity would provide an upper bound on how much information the optimization process inside the box can transmit to the outside by having selecting a biased policy distribution.
- Geoffrey Irving 15 Oct 2024 13:58 UTC
  1 point
  0
  Parent
  I would expect there to be a time complexity blowup if you try to drive the entropy all the way to zero, unfortunately: such things usually have a multiplier like $log (1 / ϵ)$ where $ϵ$ is the desired entropy leakage. In practice I think that would make it feasible to not leak something like a bit per sentence, and then if you have 1000 sentence you have 1000 bits. That may mean you can get a “not 1GB” guarantee, but not something smaller than that.