jsteinhardt comments on Tiling Agents for Self-Modifying AI (OPFAI #2)

jsteinhardt 7 Jun 2013 14:38 UTC
0 points

This means that statistical testing methods (e.g. an evolutionary algorithm’s evaluation of average fitness on a set of test problems) are not suitable for self-modifications which can potentially induce catastrophic failure (e.g. of parts of code that can affect the representation or interpretation of the goals).

I’m confused by this sentence. There are many statistical testing methods that output what are essentially proofs; e.g. statements of the form “probability of a failure existing is at most 10^(-100)”. Why would this not be sufficient?

More generally, as I’ve said in another comment, I would really like to understand how the Lob obstacle relates to statistical learning methods, especially since those seem like our best guess as to what an AI paradigm would look like.
- Eliezer Yudkowsky 7 Jun 2013 20:30 UTC
  4 points
  Parent
  What sort of statistical testing method would output a failure probability of at most 10^(-100) for generic optimization problems without trying 10^100 examples? You can get this in some mathematical situations but only because if X doesn’t have property Y then it has an independent 50% chance of showing property Z on many different trials of Z. For more generic optimization problems, if you haven’t tested fitness on 10^100 occasions you can’t rule out a >10^100 probability of any sort of possible blowup. And even if you test 10^100 samples the guarantee is only as strong as your belief that the samples were taken from a probability distribution exactly the same as real-world contexts likely to be encountered, down to the 100th decimal place.
  - jsteinhardt 7 Jun 2013 21:45 UTC
    4 points
    Parent
    It depends on the sort of guarantee you want. Certainly I can say things of the form “X and Y differ from each other in mean by at most 0.01” with a confidence that high, without 10^100 samples (as long as the samples are independent or at least not too dependent).
    
    If your optimization problem is completely unstructured then you probably can’t do better than the number of samples you have, but if it is completely unstructured then you also can’t prove anything about it, so I’m not sure what point you’re trying to make. It seems a bit unimaginative to think that you can’t come up with any statistical structure to exploit, especially if you think there is enough mathematical structure to prove strong statements about self-modification.
    - Eliezer Yudkowsky 7 Jun 2013 21:50 UTC
      5 points
      Parent
      If you can get me a conditionally independent failure probability of 10^-100 per self-modification by statistical techniques whose assumptions are true, I’ll take it and not be picky about the source. It’s the ‘true assumptions’ part that seems liable to be a sticking point. I understand how to get probabilities like this by doing logical-style reasoning on transistors with low individual failure probabilities and proving a one-wrong-number assumption over the total code (i.e., total code functions if any one instruction goes awry) but how else would you do that?
- timtyler 10 Jun 2013 10:04 UTC
  0 points
  Parent
  
  This means that statistical testing methods (e.g. an evolutionary algorithm’s evaluation of average fitness on a set of test problems) are not suitable for self-modifications which can potentially induce catastrophic failure (e.g. of parts of code that can affect the representation or interpretation of the goals).
  
  I’m confused by this sentence. There are many statistical testing methods that output what are essentially proofs; e.g. statements of the form “probability of a failure existing is at most 10^(-100)”. [...]
  
  It seems as though they would involve a huge number of trials.
  
  “Evolutionary” algorithms aren’t typically used to change fitness functions anyway. They are more usually associated with building representations of the world to make predictions with. This complaint would seem to only apply to a few “artificial life” models—in which all parts of the system are up for grabs.
- bogdanb 7 Jun 2013 20:33 UTC
  −2 points
  Parent
  
  There are many statistical testing methods that output what are essentially proofs; e.g. statements of the form “probability of a failure existing is at most 10^(-100)”. Why would this not be sufficient?
  
  (Approximate orders of magnitude:)
  
  Number of atoms in universe : 10^80
  
  Number of atoms in a human being: 10^28
  
  Number of humans that have existed: 10^10
  
  Number of AGI-creating-level inventions expected to be made by humans: 10^0–10^1
  
  Number of AGI-creating-level inventions expected to be made by 1% (10^-2) of the universe turned into computronium, with no more that human level thought-to-matter efficiency, extrapolating linearly: 10^(80 − 2 − 10 − 28) = 10^40.
  
  Hmm, that doesn’t sound that bad, but we got from 10^(-100) to 10^(-60) really fast. Also, I don’t think Eliezer was talking about that kind of statistical method.
  - jsteinhardt 7 Jun 2013 21:46 UTC
    2 points
    Parent
    I mean, I could easily make the 100 into a 400, so I don’t think this is that relevant.
    - bogdanb 7 Jun 2013 22:57 UTC
      0 points
      Parent
      Yes, the last sentence is probably my real “objection”. (Well, I don’t object to your statements, I just don’t think that’s what Eliezer meant. Even if you run a non-statistical, deterministic theorem prover, using current hardware the probability of failure is much above 10^-100.)
      
      The silly part of the comment was just a reminder (partly to myself) that AGI problems can span orders of magnitude so ridiculously outside the usual human scale that one can’t quite approximate (the number of atoms in the universe)^-1 as zero without thinking carefully about it.