Q Home comments on AI #27: Portents of Gemini

Q Home 1 Dec 2023 22:29 UTC
1 point
0
Yes, I probably mean something other than “>90%”.

[lists of various catastrophes. many of which have nothing to do with AI]

Why are you doing this? I did not say there is zero risk of anything. (...) Are you using “risk” to mean the probability of the outcome , or the impact of the outcome?

My argument is based on comparing the phenomenon of AGI to other dangerous phenomena. The argument is intended to show that bad outcome is likely (if AGI wants to do a bad thing, it can achieve it) and that impact of the outcome can kill most humans.

I think its needed for the “likely”. Slow takeoff gives humans more time to notice and fix problems, so the likelihood of bad outcomes goes down. Wasn’t that obvious?

To me the likelihood doesn’t go down enough (to the tolerable levels).
- TAG 2 Dec 2023 18:03 UTC
  2 points
  0
  Parent
  
  The argument is intended to show that bad outcome is likely (if AGI wants to do a bad thing, it can achieve it) and that impact of the outcome can kill most humans.
  
  But I am not saying that the doom is unlikely given superintelligence and misalignment, I am saying the argument that gets there—superintelligence + misalignment—is highly conjunctive. The final step., the execution as it were, is no highly conjunctive.
  
  To me the likelihood doesn’t go down enough (to the tolerable levels).
  
  Why not?
  - Q Home 3 Dec 2023 0:10 UTC
    1 point
    0
    Parent
    I’ve confused you with people who deny that a misaligned AGI is even capable of killing most humans. Glad to be wrong about you.
    
    But I am not saying that the doom is unlikely given superintelligence and misalignment, I am saying the argument that gets there—superintelligence + misalignment—is highly conjunctive. The final step., the execution as it were, is no highly conjunctive.
    
    But I don’t agree that it’s highly conjunctive.
    
    If AGI is possible, then its superintelligence is a given. Superintelligence isn’t given only if AGI stops at human level of intelligence + can’t think much faster than humans + can’t integrate abilities of narrow AIs naturally. (I.e. if AGI is basically just a simulation of a human and has no natural advantages.) I think most people don’t believe in such AGI.
    I don’t think misalignment is highly conjunctive.
    
    I agree that hard takeoff is highly conjunctive, but why is “superintelligence + misalignment” highly conjunctive?
    
    I think its needed for the “likely”. Slow takeoff gives humans more time to notice and fix problems, so the likelihood of bad outcomes goes down. Wasn’t that obvious?
    
    If AGI is AGI, there won’t be any problems to notice. That’s why I think probability doesn’t decrease enough.
    
    ...
    
    I hope that Alignment is much easier to solve than it seems. But I’m not sure (a) how much weight to put into my own opinion and (b) how much my probability of being right decreases the risk.
    - TAG 3 Dec 2023 4:15 UTC
      3 points
      1
      Parent
      
      If AGI is possible, then its superintelligence is a given
      
      It needs to happen quickly or surreptitiously to be a problem.
      
      I don’t think misalignment is highly conjunctive
      
      Incorrigible misalignment is at least one extra assumption.
      
      why is “superintelligence + misalignment” highly conjunctive?
      
      In the sense that matters, it needs to be fast, surreptitious, incorrigible, etc.
      
      If AGI is AGI, there won’t be any problems to notice
      
      Huh?
      - Q Home 4 Dec 2023 2:15 UTC
        1 point
        0
        Parent
        
        why is “superintelligence + misalignment” highly conjunctive?
        
        In the sense that matters, it needs to be fast, surreptitious, incorrigible, etc.
        
        What opinion are you currently arguing? That the risk is below 90% or something else? What counts as “high probability” for you?
        
        Incorrigible misalignment is at least one extra assumption.
        
        I think “corrigible misalignment” doesn’t exist, corrigble AGI is already aligned (unless AGI can kill everyone very fast by pure accident). But we can have differently defined terms. To avoid confusion, please give examples of scenarios you’re thinking about. The examples can be very abstract.
        
        If AGI is AGI, there won’t be any problems to notice
        
        Huh?
        
        I mean, you haven’t explained what “problems” you’re talking about. AGI suddenly declaring “I think killing humans is good, actually” after looking aligned for 1 year? If you didn’t understand my response, a more respectful answer than “Huh?” would be to clarify your own statement. What noticeable problems did you talk about in the first place?
        
        Please, proactively describe your opinions. Is it too hard to do? Conversation takes two people.
        TAG 4 Dec 2023 16:56 UTC
        5 points
        0
        Parent
        
        What counts as “high probability” for you?
        
        Over 90% , as I said
        
        think “corrigible misalignment” doesn’t exist, corrigble AGI is already aligned
        
        It’s not aligned at every possible point in time.
        
        please give examples of scenarios you’re thinking about.
        
        I’m, talking about the Foom scenario that has been discussed endlessly here.
        
        The complete argument for Foom Doom that:-
        
        the AI will have goals/values in the first place (it wont be a tool like GPT*),
        the values will be misaligned, however subtly, to be unfavorable to humanity
        that the misalignment cannot be detected or corrected
        that the AI can achieve value stability under self modification
        That the AI will self modify in way too fast to stop
        and that most misaligned values in the resulting ASI are highly dangerous.
        Q Home 4 Dec 2023 22:41 UTC
        1 point
        0
        Parent
        
        It’s not aligned at every possible point in time.
        
        I think corrigibility is “AGI doesn’t try to kill everyone and doesn’t try to prevent/manipulate its modification”. Therefore, in some global sense such AGI is aligned at every point in time. Even if it causes a local disaster.
        
        Over 90% , as I said
        
        Then I agree, thank you for re-explaining your opinion. But I think other probabilities count as high too.
        
        To me, the ingredients of danger (but not “> 90%”) are those:
        
        1st. AGI can be built without Alignment/Interpretability being solved. If that’s true, building AGI slowly or being able to fix visible problems may not matter that much.
        2nd and 3rd. AGI can have planning ability. AGI can come up with the goal pursuing which would kill everyone.
        2nd (alternative). AIs and AGIs can kill most humans without real intention of doing so, by destabilizing the world/amplifying already existing risks.
        
        If I remember correctly, Eliezer also believes in “intelligence explosion” (AGI won’t be just smarter than humanity, but many-many times smarter than humanity: like humanity is smarter than ants/rats/chimps). Haven’t you forgot to add that assumption?
        TAG 5 Dec 2023 1:32 UTC
        2 points
        1
        Parent
        I think corrigibility is the ability to change a value/goal system. That the literal meaning of the term… “Correctable”. If an AI were fully aligned, there would be no need to correct it.
        
        Yes, there are dangers other than a high probability of killing almost every one. I didn’t say there arent. But it’s motte and baileying to fall back to “what about these lesser risks”.
        
        If I remember correctly, Eliezer also believes in “intelligence explosion” (AGI won’t be just smarter than humanity, but many-many times
        
        Yes, and that’s the specific argument I am addressing,not AI risk in general.
        
        Except that if it’s many many times smarter, it’s ASI, not AGI.
        Q Home 5 Dec 2023 23:33 UTC
        1 point
        0
        Parent
        
        I think corrigibility is the ability to change a value/goal system. That the literal meaning of the term… “Correctable”. If an AI were fully aligned, there would be no need to correct it.
        
        Perhaps I should make a better argument:
        
        It’s possible that AGI is correctable, but (a) we don’t know what needs to be corrected or (b) we cause new, less noticeable problems, while correcting AGI.
        
        So, I think there’s not two assumptions “alignment/interpretability is not solved + AGI is incorrigible”, but only one — “alignment/interpretability is not solved”. (A strong version of corrigibility counts as alignment/interpretability being solved.)
        
        Yes, and that’s the specific argument I am addressing,not AI risk in general. Except that if it’s many many times smarter, it’s ASI, not AGI.
        
        I disagree that “doom” and “AGI going ASI very fast” are certain (> 90%) too.
        [ ]
        [deleted]