cousin_it comments on Formalizing Value Extrapolation

cousin_it 26 Apr 2012 19:08 UTC
0 points

(Note that the hypothetical process probably doesn’t even output a goal specification, it just outputs a number, which the AI tries to control.)

That’s certainly a nice answer to the question “what’s the domain of the probability distribution and the utility function?” You just say that the utility function is a parameterless definition of a single number. But that seems to lead to the danger of the AI using acausal control to make the hypothetical human output a definition that’s easy to maximize. Do you think that’s unlikely to happen?

ETA: on further thought, this seems to be a pretty strong argument against the whole “indirect normativity” idea, regardless of what kind of object the hypothetical human is supposed to output.
- Vladimir_Nesov 26 Apr 2012 19:46 UTC
  8 points
  Parent
  The outer AGI doesn’t control the initial program if the initial program doesn’t listen to the outer AGI. It’s a kind of reverse AI box problem: the program that the AGI runs shouldn’t let the AGI in. This certainly argues that the initial program should take no input, and output its result blindly. That it shouldn’t run the outer AGI internally is then the same kind of AI safety consideration as that it shouldn’t run any other UFAI internally, so it doesn’t seem like an additional problem.
  - paulfchristiano 26 Apr 2012 20:46 UTC
    6 points
    Parent
    Of course, once you are powerful enough you let the AGI in (or you define a utility function which invokes the AI, which is really no difference), because this is how you control it.
    - Vladimir_Nesov 26 Apr 2012 20:54 UTC
      4 points
      Parent
      I don’t understand your comment. [Edit: I probably do now.] You output something that the outer AGI uses to optimize the world as you intend, you don’t “let the AGI in”. You are living in its goal definition, and your decisions determine AGI’s values.
      
      Are you perhaps referring to the idea that AGI’s actions control its goal state? But you are not its goal state, you are a principle that determines its goal state, just as the AGI is. You show the AGI where to find its goal state, and the AGI starts working on optimizing it.
      - Wei Dai 26 Apr 2012 21:55 UTC
        6 points
        Parent
        What’s the difference between the simulated humans outputting a utility function U’ which the outer AGI will then try to maximize, and the simulated humans just running U’ and the outer AGI trying to maximize the value returned by the whole simulation (and hence U’)? If case of the latter, you’re “letting the AGI in” by including its definition (explicitly or implicitly via something like the universal prior) in the definition of U’.
        Vladimir_Nesov 26 Apr 2012 22:23 UTC
        4 points
        Parent
        OK, I see what Paul probably meant. Let’s say “utility value”, not “utility function”, since that’s what we mean. I don’t think we should be talking about “running utility value”, because utility might be something given by an abstract definition, not state of execution of any program. As I discussed in the grandparent, the distinction I’m making is between the outer AGI controlling utility value (which it does) and outer AGI controlling the simulated researchers that prepare the definition of utility value (which it shouldn’t be allowed to for AI safety reasons). There is a map/territory distinction between the definition of utility value prepared by the initial program and the utility value itself optimized by the outer AGI.
        What links here?
        Vladimir_Nesov's comment on Formalizing Value Extrapolation by paulfchristiano (26 Apr 2012 20:54 UTC; 4 points)
        Wei Dai 26 Apr 2012 23:38 UTC
        4 points
        Parent
        (Also, “utility function” might be confusing especially for outsiders who are used to “utility function” meaning a mapping from world states to utility values, whereas Paul is using it to mean a parameterless computation that returns a utility value.)
        
        I don’t think we should be talking about “running utility value”, because utility might be something given by an abstract definition, not state of execution of any program.
        
        I think Paul is thinking that the utility definition that the simulated humans come up with is not necessarily a definition of our actual values, but just something that causes the outer AGI to self-modify into an FAI, and for that purpose it might be enough to define it using a programming language.
        
        As I discussed in the grandparent, the distinction I’m making is between the outer AGI controlling utility value (which it does) and outer AGI controlling the simulated researchers that prepare the definition of utility value (which it shouldn’t be allowed to for AI safety reasons).
        
        I think Paul’s intuition here is that the simulated humans (or enhanced humans and/or FAIs they build inside the simulation) may find it useful to “blur the lines”. In other words, the distinction you draw is not a fundamental one but just a safety heuristic that the simulated researchers may decide to discard or modify once they become “powerful enough”. For example they may decide to partially simulate the outer AGI or otherwise try to reason about what it might do given various definitions of U’ the simulation might ultimately decide upon, once they understand enough theory to see how to do this in a safe way.
  - cousin_it 26 Apr 2012 20:11 UTC
    2 points
    Parent
    
    That is shouldn’t run the outer AGI internally is then the same kind of AI safety consideration as that it shouldn’t run any other UFAI internally
    
    Good point. Thanks.