TLW comments on Implications of automated ontology identification

TLW 23 Feb 2022 3:38 UTC
1 point
In the 99% safety guarantee, you can just train a bunch of separate predictor/reporter pairs on the same initial training data and take the intersection of their decision boundaries to get a 99.9% guarantee.
Counterexample: here is an infinite set of unique predictors that each have a 99% safety guarantee that when combined together have a… 99% safety guarantee.

Ground truth:
$0 \leq x \leq 1, x \in R$
$f (x) = {\begin{matrix} YES, & if x \leq 0.5, NO, & if x > 0.5. \end{matrix}$
Predictor n:
$p_{n} (x) = ⎧ ⎨ ⎩ \begin{matrix} YES, & if (x \leq 0.50) \land (Random oracle queried on (n, x) returns True) YES, & if (0.50 < x \leq 0.51) NO, & otherwise \end{matrix}$
(If you want to make this more rigorous, replace the Random oracle query with e.g. digits of Normal numbers.)
(Analogous arguments apply in finite domains, so long as the number of possible predictors is relatively large compared to the number of actual predictors.)
But then we can use completely different sensor data—camera data, lidar data, microphone data—for each of the pairs, and proceed that way. We can still iterate the overall scheme.
No two sets of sensor data are truly ‘completely different’. Among many other things, the laws of Physics remain the same.