Eliezer Yudkowsky comments on Tiling Agents for Self-Modifying AI (OPFAI #2)

Eliezer Yudkowsky 7 Jun 2013 20:30 UTC
4 points
What sort of statistical testing method would output a failure probability of at most 10^(-100) for generic optimization problems without trying 10^100 examples? You can get this in some mathematical situations but only because if X doesn’t have property Y then it has an independent 50% chance of showing property Z on many different trials of Z. For more generic optimization problems, if you haven’t tested fitness on 10^100 occasions you can’t rule out a >10^100 probability of any sort of possible blowup. And even if you test 10^100 samples the guarantee is only as strong as your belief that the samples were taken from a probability distribution exactly the same as real-world contexts likely to be encountered, down to the 100th decimal place.
- jsteinhardt 7 Jun 2013 21:45 UTC
  4 points
  Parent
  It depends on the sort of guarantee you want. Certainly I can say things of the form “X and Y differ from each other in mean by at most 0.01” with a confidence that high, without 10^100 samples (as long as the samples are independent or at least not too dependent).
  
  If your optimization problem is completely unstructured then you probably can’t do better than the number of samples you have, but if it is completely unstructured then you also can’t prove anything about it, so I’m not sure what point you’re trying to make. It seems a bit unimaginative to think that you can’t come up with any statistical structure to exploit, especially if you think there is enough mathematical structure to prove strong statements about self-modification.
  - Eliezer Yudkowsky 7 Jun 2013 21:50 UTC
    5 points
    Parent
    If you can get me a conditionally independent failure probability of 10^-100 per self-modification by statistical techniques whose assumptions are true, I’ll take it and not be picky about the source. It’s the ‘true assumptions’ part that seems liable to be a sticking point. I understand how to get probabilities like this by doing logical-style reasoning on transistors with low individual failure probabilities and proving a one-wrong-number assumption over the total code (i.e., total code functions if any one instruction goes awry) but how else would you do that?