AprilSR comments on Code Generation as an AI risk setting

AprilSR 18 Apr 2022 0:22 UTC
2 points
I think this is good to get people initially on board, but I worry that people will start to falsely think that tasks unrelated to writing code are safe.
- Not Relevant 18 Apr 2022 0:43 UTC
  2 points
  Parent
  Honest question: what’s the easiest x-risk scenario that doesn’t involve generating code at some point? I’m struggling to think of any that aren’t pretty convoluted.
  
  (I agree with the point, but think it’s easier to make once our foot is in the door.)
  - Daniel Kokotajlo 18 Apr 2022 8:50 UTC
    11 points
    Parent
    IMO the point of no return will be passed before recursive self improvement probably. All we need is a sufficiently charismatic chatbot to start getting strategic about what it says to people.
  - AprilSR 18 Apr 2022 0:47 UTC
    7 points
    Parent
    I don’t especially disagree that it’s most likely the AI to end the world will be one that writes code? But if you keep throwing optimization power into a reasonably general AI that has no direct code experience it’ll still end the world eventually.
    
    If the AI isn’t a code-writing one I don’t have any particular next guess.
    - RHollerith 18 Apr 2022 5:15 UTC
      3 points
      Parent
      Somewhere in the late-2021 MIRI conversations Eliezer opines that non-recursively-self-improving AI are definitely dangerous. I can search for it if anyone is interested.
      - Cedar 18 Apr 2022 15:31 UTC
        3 points
        Parent
        Yes please!
        RHollerith 18 Apr 2022 18:21 UTC
        5 points
        Parent
        From Discussion with Eliezer Yudkowsky on AGI interventions:
        
        Compared to the position I was arguing in the Foom Debate with Robin, reality has proved way to the further Eliezer side of Eliezer along the Eliezer-Robin spectrum. It’s been very unpleasantly surprising to me how little architectural complexity is required to start producing generalizing systems, and how fast those systems scale using More Compute. The flip side of this is that I can imagine a system being scaled up to interesting human+ levels, without “recursive self-improvement” or other of the old tricks that I thought would be necessary, and argued to Robin would make fast capability gain possible. You could have fast capability gain well before anything like a FOOM started. Which in turn makes it more plausible to me that we could hang out at interesting not-superintelligent levels of AGI capability for a while before a FOOM started. It’s not clear that this helps anything, but it does seem more plausible.
        
        From Ngo and Yudkowsky on alignment difficulty:
        
        It later turned out that capabilities started scaling a whole lot without self-improvement, which is an example of the kind of weird surprise the Future throws at you . . .
        
        RHollerith 18 Apr 2022 18:27 UTC
        2 points
        Parent
        And yeah I realize now that my summary of what Eliezer wrote is not particularly close to what he actually wrote.
  - mruwnik 18 Apr 2022 16:10 UTC
    1 point
    Parent
    Depends what you mean by generate code. Can it have a prebaked function that copies itself (like computer viruses)? Does it count if it generates programs to attack other systems? If it changes its own source code? Its code stored in memory? You could argue that changing anything in memory is in a certain sense generating code.
    If it can’t generate code, it’ll be a 1 shot type of thing. Which means that it must be preprogrammed with the tools to do its job. I can’t come up with any way for it to take control, but it doesn’t seem that hard to come up with some doomsday machine scenarios. E.g. smashing a comet into earth, or making a virus that sterilizes everyone. Or a Shiri’s Scissor could do the trick. The idea being to make something that doesn’t have to learn or improve itself too much.
    - Not Relevant 18 Apr 2022 17:58 UTC
      1 point
      Parent
      I was thinking of “something that can understand and write code at the level of a 10x SWE”. I’m further assuming that human designers didn’t give it functions to copy itself or other dumb things.