Charbel-Raphaël comments on Thoughts on “AI is easy to control” by Pope & Belrose

Charbel-Raphaël 1 Jan 2024 21:09 UTC
2 points
0
What about Section 1?
- Nora Belrose 2 Jan 2024 1:50 UTC
  2 points
  −2
  Parent
  Our 1% doom number excludes misuse-flavored failure modes, so I considered it out of scope for my response. I think the fact that good humans have been able to keep rogue bad humans more-or-less under control for millennia is strong evidence that good AIs will be able to keep rogue AIs under control, and I think the evidence is pretty mixed on whether the so-called offense-defense balance will be skewed toward offense or defense— I weakly expect defense will be preferred, mainly through centralization-of-power effects.
  - Kabir Kumar 27 Mar 2025 12:02 UTC
    14 points
    0
    Parent
    the fact that good humans have been able to keep rogue bad humans more-or-less under control
    Isn’t stuff like the transatlantic slave trade, genocide of native americans, etc evidence that the amount isn’t sufficient??
  - Charbel-Raphaël 2 Jan 2024 10:46 UTC
    6 points
    1
    Parent
    It’s not strong evidence; it’s a big mess, and it seems really difficult to have any kind of confidence in such a fast-changing world. It feels to me that it’s going to be a roughly ⁵⁰⁄₅₀ bet. Saying the probability is 1% requires much more work that I’m not seeing, even if I appreciate what you are putting up.
    On the offense-defense balance, there is no clear winner in the comment sections here, neither here. We’ve already seen a takeover between two different roughly equal human civilizations (see the story of the conquistadors) under certain circumstances. And AGI is at least more dangerous than nuclear weapons, and we came pretty close to nuclear war several times. Covid seems to come from gain of function research, etc...
    On fast vs slow takeoff, it seems to me that fast takeoff breaks a lot of your assumptions, and I would assign much more than a 1% probability for fast takeoff. Even when you still embrace the compute-centric framework (which I find conservative), you still get wild numbers, like a two-digits probability of takeoff lasting less than a year. If so, we won’t have the time to implement defense strategies.
    - Nora Belrose 2 Jan 2024 12:41 UTC
      2 points
      −6
      Parent
      I don’t think it makes sense to “revert to a uniform prior” over {doom, not doom} here. Uniform priors are pretty stupid in general, because they’re dependent on how you split up the possibility space. So I prefer to stick fairly close to the probabilities I get from induction over human history, which tell me p(doom from unilateral action) << 50%
      I strongly disagree that AGI is “more dangerous” than nukes; I think this equivocates over different meanings of the term “dangerous,” and in general is a pretty unhelpful comparison.
      I find foom pretty ludicrous, and I don’t see a reason to privilege the hypothesis much.
      From the linked report:
      My best guess is that we go from AGI (AI that can perform ~100% of cognitive tasks as well as a human professional) to superintelligence (AI that very significantly surpasses humans at ~100% of cognitive tasks) in less than a year.
      I just agree with this (if “significantly” means like 5x or something), but I wouldn’t call it “foom” in the relevant sense. It just seems orthogonal to the whole foom discussion.
      - Charbel-Raphaël 2 Jan 2024 15:36 UTC
        3 points
        1
        Parent
        I don’t think it makes sense to “revert to a uniform prior” over {doom, not doom} here.
        I’m not using a uniform prior; the ⁵⁰⁄₅₀ thing is just me expressing my views, all things considered.
        I’m using a decomposition of the type:
        Does it want to harm us? Yes, because of misuses, ChaosGPT, wars, psychopaths firing in schools, etc...
        Can it harm us? This is really hard to tell.
        I strongly disagree that AGI is “more dangerous” than nukes; I think this equivocates over different meanings of the term “dangerous,” and in general is a pretty unhelpful comparison.
        Okay. Let’s be more precise: “An AGI that has the power to launch nukes is at least more powerful than nukes.” Okay, and now, how would AGI acquire this power? That doesn’t seem that hard in the present world. You can bribe/threaten leaders, use drones to kill a leader during a public visit, and then help someone to gain power and become your puppet during the period of confusion à la conquistadors. The game of thrones is complex and brittle; this list of coups is rather long, and the default for a civilization/family reigning in some kingdom is to be overthrown.
        I prefer to stick fairly close to the probabilities I get from induction over human history, which tell me p(doom from unilateral action) << 50%
        I don’t like the word “doom”. I prefer to use the expression ‘irreversibly messed up future’, inspired by Christiano’s framing (and because of anthropic arguments, it’s meaningless to look at past doom events to compute this proba).
        I’m really not sure what should be the reference class here. Yes, you are still living and the human civilization is still here but:
        Napoleon and Hitler are examples of unilateral actions that led to international wars.
        If you go from unilateral action to multilateral actions, and you allow stuff like collusion, things become easier. And collusion is not that wild, we already see this in Cicero: the AI playing as France, conspired with Germany to trick England.
        As the saying goes: “AI is a wonderful tool for the betterment of humanity; AGI is a potential successor species.” So maybe the reference class is more something like chimps, neanderthals or horses. Another reference class could be something like Slave rebellion.
        I find foom pretty ludicrous, and I don’t see a reason to privilege the hypothesis much.
        We don’t need the strict MIRI-like RSI foom to get in trouble. I’m saying if AI technology does not have the time to percolate in the economy, we won’t have the time to upgrade our infrastructure and add much more defense than what we have today, which seems to be the default.
        Nora Belrose 2 Jan 2024 23:22 UTC
        2 points
        1
        Parent
        
        because of anthropic arguments, it’s meaningless to look at past doom events to compute this proba
        
        I disagree; anthropics is pretty normal (https://www.lesswrong.com/posts/uAqs5Q3aGEen3nKeX/anthropics-is-pretty-normal)
  - Signer 10 Jan 2024 12:57 UTC
    4 points
    2
    Parent
    
    I think the fact that good humans have been able to keep rogue bad humans more-or-less under control for millennia is strong evidence that good AIs will be able to keep rogue AIs under control
    
    Why? Like, what law of nature says that the trend in this terms should continue?
    - Nora Belrose 10 Jan 2024 23:07 UTC
      −30 points
      −21
      Parent
      Game theory
      - Signer 11 Jan 2024 5:22 UTC
        2 points
        0
        Parent
        Yes, but available strategies can change for AI vs humans—why assume they will be the same?
        
        Induction from history depends on it’s interpretation—we have more information than 1111111111 over {bad, not-so-bad}. It just feels like at present point the crux between optimists and doomers is not about whether white box access or trained mind-space is better, about how much it all updates you from what prior.