cousin_it comments on Paul’s research agenda FAQ

cousin_it 3 Jul 2018 14:31 UTC
4 points
It seems to me that uploading tech would be a solution to AI risk, because a trusted team of uploads running at high speed can stop other AIs from arising and figure out the next steps. The first stage assistants proposed by Paul’s plan already require tech that’s pretty close to uploading tech, and will be very useful for developing uploading tech even without the later recursive stages. So the window of usefulness for the first stage seems small, and the window of usefulness for the later recursive stages seems even smaller. Am I missing something?
What links here?
- riceissa's comment on Iterated Distillation and Amplification by Ajeya Cotra (30 Aug 2019 5:47 UTC; 6 points)
- paulfchristiano 3 Jul 2018 15:31 UTC
  4 points
  Parent
  I don’t think it requires anything like uploading tech. It just involves training a model using RL (or RL+imitation learning), that’s something we can do today.
  - cousin_it 3 Jul 2018 15:41 UTC
    9 points
    Parent
    I thought your first stage assistants were supposed to be as good as humans at many tasks, including answering questions about their own thoughts. Is that much easier than imitating a specific human?
    - paulfchristiano 3 Jul 2018 18:29 UTC
      8 points
      Parent
      There is no requirement about the first stage assistant being human-level. I expect they will be superhuman in some respects and subhuman in others, just like existing AI.
      - cousin_it 3 Jul 2018 19:32 UTC
        16 points
        Parent
        From Ajeya Cotra’s post:
        
        The Distill procedure robustly preserves alignment: Given an aligned agent H we can use narrow safe learning techniques to train a much faster agent A which behaves as H would have behaved, without introducing any misaligned optimization or losing important aspects of what H values.
        
        This seems to say every step of IDA, including the first, requires a Distill procedure that’s at least strong enough to upload a human. Maybe I’m looking at the wrong post?
        paulfchristiano 4 Jul 2018 2:50 UTC
        8 points
        Parent
        I agree that “behaves as H would have behaved” seems wrong/sloppy.Iit’s referring to the “narrow” end of the spectrum introduced in that post (containing imitation learning, narrow RL, narrow IRL). So H is a rough upper bound on its intelligence.
        The assumption is “The Distill procedure robustly preserves alignment.” You may think that’s only possible with an exact imitation, in which case I agree that you will never get off the ground.
        In Ajeya’s defense, people do often use shorthand like describing an imitation learner’s behavior as “doing what the expert would do” without meaning to imply that it’s a perfect imitation. I agree in this case it’s unusually confusing.
        cousin_it 4 Jul 2018 6:59 UTC
        6 points
        Parent
        It seems to me that doing that without “losing important aspects of what H values” would lead to something human-like anyway (though maybe not an exact imitation of H), because of complexity of value. Basically after the first step you get human-like entities running on computers. Then they can prevent AI risk and carefully figure out what to do next, same as a team of uploads. So the first step looks strategically similar to uploading, and solving stability for further steps might be unnecessary.
        paulfchristiano 6 Jul 2018 16:23 UTC
        2 points
        Parent
        The resulting agent is supposed to be trying to help H get what its wants, but won’t generally encode most of H’s values directly (it will only encode them indirectly as “what the operator wants”).
        I agree that Ajeya’s description in that paragraph is problematic (though I think the descriptions in the body of the post were mostly fine), will probably correct it.
        cousin_it 6 Jul 2018 21:12 UTC
        3 points
        Parent
        Then I’m not sure I understand how the scheme works. If all questions about values are punted to the single living human at the top, won’t that be a bottleneck for any complex plan?