Signer comments on Counting arguments provide no evidence for AI doom

Signer 29 Feb 2024 17:43 UTC
12 points
8
I don’t get how you can arrive at 0.1% for future AI systems even if NNs are biased against scheming. Humans scheme, the future AI systems trained to be capable of long if-then chains may also learn to scheme, maybe because explicitly changing biases is good for performance. Or even, what, you have <0.1% on future AI systems not using NNs?

Also, not saying “but it doesn’t matter”, but assuming everyone agrees that spectrally biased NN with classifier or whatever is a promising model of a safe system. Do you then propose we should not worry and just make the most advanced AI we can as fast as possible. Or it would be better to first reduce remaining uncertainty about behavior of future systems?
- Nora Belrose 4 Mar 2024 1:06 UTC
  2 points
  0
  Parent
  I’m saying <0.1% chance on “world is ended by spontaneous scheming.” I’m not saying no AI will ever do anything that might be well-described as scheming, for any reason.
  - ryan_greenblatt 4 Mar 2024 1:51 UTC
    11 points
    3
    Parent
    The exact language you use in the post is:
    
    We therefore conclude that we should assign very low credence to the spontaneous emergence of scheming in future AI systems— perhaps 0.1% or less.
    
    I personally think there is a moderate gap (perhaps factor of 3) between “world is ended by serious^[1] spontaneous scheming” and “serious spontaneous scheming”. And, I could imagine updating to a factor of 10 if the world seemed better prepared etc. So, it might be good to clarify this in the post. (Or clarify your comment.)
    
    (I think perhaps spontaneous scheming (prior to human obsolence) is ~25% likely and x-risk conditional on being in one of those worlds which is due to this scheming is about 30% likely for an overall 8% on “world is ended by serious spontaneous scheming” (prior to human obsolence).)
    
    ↩︎
    serious = somewhat persistant, thoughtful, etc
  - mike_hawke 6 Mar 2024 1:11 UTC
    4 points
    2
    Parent
    EDIT: This is wrong. See descendent comments.
    I spent a bunch of time wondering how you could could put 99.9% on no AI ever doing anything that might be well-described as scheming for any reason. I was going to challenge you to list a handful of other claims that you had similar credence in, until I searched the comments for “0.1%” and found this one.
    ~~I’m annoyed at this, and I request that you prominently edit the OP.~~
    - Quintin Pope 7 Mar 2024 3:24 UTC
      4 points
      2
      Parent
      The post says “we should assign very low credence to the spontaneous emergence of scheming in future AI systems— perhaps 0.1% or less.”
      I.e., not “no AI will ever do anything that might be well-described as scheming, for any reason.”
      It should be obvious that, if you train an AI to scheme, you can get an AI that schemes.
      - mike_hawke 12 Mar 2024 23:49 UTC
        14 points
        2
        Parent
        Damn, woops.
        My comment was false (and strident; worst combo). I accept the strong downvote and I will try to now make a correction.
        I said:
        I spent a bunch of time wondering how you could could put 99.9% on no AI ever doing anything that might be well-described as scheming for any reason.
        
        What I meant to say was:
        I spent a bunch of time wondering how you could put 99.9% on no AI ever doing anything that might be well-described as scheming for any reason, even if you stipulate that it must happen spontaneously.
        And now you have also commented:
        Well, I have <0.1% on spontaneous scheming, period. I suspect Nora is similar and just misspoke in that comment.
        So....I challenge you to list a handful of other claims that you have similar credence in. Special Relativity? P!=NP? Major changes in our understanding of morality or intelligence or mammal psychology? China pulls ahead in AI development? Scaling runs out of steam and gives way to other approaches like mind uploading? Major betrayal against you by a beloved family member?
        The OP simply says “future AI systems” without specifying anything about these systems, their paradigm, or what offworld colony they may or may not be developed on. Just...all AI systems henceforth forever. Meaning that no AI creators will ever accidentally recapitulate the scheming that is already observed in nature...? That’s such a grand, sweeping claim. If you really think it’s true, I just don’t understand your worldview. If you’ve already explained why somewhere, I hope someone will link me to it.
    - Noosphere89 7 Mar 2024 1:07 UTC
      2 points
      0
      Parent
      Agree with this hugely, though I could make a partial defense of the confidence given, but yes I’d like this post to be hugely edited.
      - Nora Belrose 7 Mar 2024 2:59 UTC
        1 point
        0
        Parent
        What do you mean “hugely edited”? What other things would you like us to change? If I were starting from scratch I would of course write the post differently but I don’t think it would be worth my time to make major post hoc edits; I would like to focus on follow up posts.
        Noosphere89 7 Mar 2024 3:30 UTC
        2 points
        0
        Parent
        Specifically, I wanted the edit to be a clarification that you only have a <0.1% probability on spontaneous scheming ending the world.
        Quintin Pope 7 Mar 2024 4:08 UTC
        2 points
        −2
        Parent
        Well, I have <0.1% on spontaneous scheming, period. I suspect Nora is similar and just misspoke in that comment.
        Nora Belrose 7 Mar 2024 7:16 UTC
        1 point
        −2
        Parent
        If it’s spontaneous then yeah, I don’t expect it to happen ~ever really. I was mainly thinking about cases where people intentionally train models to scheme.