Q Home comments on AI #27: Portents of Gemini

Q Home 4 Dec 2023 22:41 UTC
1 point
0
It’s not aligned at every possible point in time.

I think corrigibility is “AGI doesn’t try to kill everyone and doesn’t try to prevent/manipulate its modification”. Therefore, in some global sense such AGI is aligned at every point in time. Even if it causes a local disaster.

Over 90% , as I said

Then I agree, thank you for re-explaining your opinion. But I think other probabilities count as high too.

To me, the ingredients of danger (but not “> 90%”) are those:
- 1st. AGI can be built without Alignment/Interpretability being solved. If that’s true, building AGI slowly or being able to fix visible problems may not matter that much.
- 2nd and 3rd. AGI can have planning ability. AGI can come up with the goal pursuing which would kill everyone.
- 2nd (alternative). AIs and AGIs can kill most humans without real intention of doing so, by destabilizing the world/amplifying already existing risks.
If I remember correctly, Eliezer also believes in “intelligence explosion” (AGI won’t be just smarter than humanity, but many-many times smarter than humanity: like humanity is smarter than ants/rats/chimps). Haven’t you forgot to add that assumption?
- TAG 5 Dec 2023 1:32 UTC
  2 points
  1
  Parent
  I think corrigibility is the ability to change a value/goal system. That the literal meaning of the term… “Correctable”. If an AI were fully aligned, there would be no need to correct it.
  
  Yes, there are dangers other than a high probability of killing almost every one. I didn’t say there arent. But it’s motte and baileying to fall back to “what about these lesser risks”.
  
  If I remember correctly, Eliezer also believes in “intelligence explosion” (AGI won’t be just smarter than humanity, but many-many times
  
  Yes, and that’s the specific argument I am addressing,not AI risk in general.
  
  Except that if it’s many many times smarter, it’s ASI, not AGI.
  - Q Home 5 Dec 2023 23:33 UTC
    1 point
    0
    Parent
    
    I think corrigibility is the ability to change a value/goal system. That the literal meaning of the term… “Correctable”. If an AI were fully aligned, there would be no need to correct it.
    
    Perhaps I should make a better argument:
    
    It’s possible that AGI is correctable, but (a) we don’t know what needs to be corrected or (b) we cause new, less noticeable problems, while correcting AGI.
    
    So, I think there’s not two assumptions “alignment/interpretability is not solved + AGI is incorrigible”, but only one — “alignment/interpretability is not solved”. (A strong version of corrigibility counts as alignment/interpretability being solved.)
    
    Yes, and that’s the specific argument I am addressing,not AI risk in general. Except that if it’s many many times smarter, it’s ASI, not AGI.
    
    I disagree that “doom” and “AGI going ASI very fast” are certain (> 90%) too.