TimS comments on Superintelligent AGI in a box—a question.

TimS 25 Feb 2012 3:24 UTC
0 points
Well, one way to be a better optimizer is to ensure that one’s optimizations are actually implemented. When the program self-modifies, how do we ensure that this capacity is not created? The worst case scenario is that the program learns to improve its ability to persuade you that changes to the code should be authorized.

In short, allowing the program to “optimize” itself does not define what should be optimized. Deciding what should be optimized is the output of some function, so I suggest calling that the “utility function” of the program. If you don’t program it explicitly, you risk such a function appearing through unintended interactions of functions that were programmed explicitly.
- jacobt 25 Feb 2012 3:36 UTC
  0 points
  Parent
  
  Well, one way to be a better optimizer is to ensure that one’s optimizations are actually implemented.
  
  No, changing program (2) to persuade the human operators will not give it a better score according to criterion (3).
  
  In short, allowing the program to “optimize” itself does not define what should be optimized. Deciding what should be optimized is the output of some function, so I suggest calling that the “utility function” of the program. If you don’t program it explicitly, you risk such a function appearing through unintended interactions of functions that were programmed explicitly.
  
  I assume you’re referring to the fitness function (performance on training set) as a utility function. It is sort of like a utility function in that the program will try to find code for (2) that improves performance for the fitness function. However it will not do anything like persuading human operators to let it out in order to improve the utility function. It will only execute program (2) to find improvements. Since it’s not exactly like a utility function in the sense of VNM utility it should not be called a utility function.
  - TimS 25 Feb 2012 4:18 UTC
    0 points
    Parent
    
    allow the improvement if it makes it do better on average on the sample optimization problems without being significantly more complex (to prevent overfitting). That is, the fitness function would be something like (average performance—k * bits of optimizer program).
    
    Who exactly is doing the “allowing”? If the program, the criteria for allowing changes hasn’t been rigorously defined. If the human, how are we verifying that there is improvement over average performance? There is no particular guarantee that the verification of improvement will be easier than discovering the improvement (by hypothesis, we couldn’t discover the latter without the program).
    - jacobt 25 Feb 2012 4:21 UTC
      0 points
      Parent
      
      Who exactly is doing the “allowing”?
      
      Program (3), which is a dumb, non-optimized program. See this for how it could be defined.
      
      There is no particular guarantee that the verification of improvement will be easier than discovering the improvement (by hypothesis, we couldn’t discover the latter without the program).
      
      See this. Many useful problems are easy to verify and hard to solve.