Nora Belrose comments on Thoughts on “AI is easy to control” by Pope & Belrose

Nora Belrose 2 Jan 2024 12:41 UTC
2 points
−6
I don’t think it makes sense to “revert to a uniform prior” over {doom, not doom} here. Uniform priors are pretty stupid in general, because they’re dependent on how you split up the possibility space. So I prefer to stick fairly close to the probabilities I get from induction over human history, which tell me p(doom from unilateral action) << 50%
I strongly disagree that AGI is “more dangerous” than nukes; I think this equivocates over different meanings of the term “dangerous,” and in general is a pretty unhelpful comparison.
I find foom pretty ludicrous, and I don’t see a reason to privilege the hypothesis much.
From the linked report:
My best guess is that we go from AGI (AI that can perform ~100% of cognitive tasks as well as a human professional) to superintelligence (AI that very significantly surpasses humans at ~100% of cognitive tasks) in less than a year.
I just agree with this (if “significantly” means like 5x or something), but I wouldn’t call it “foom” in the relevant sense. It just seems orthogonal to the whole foom discussion.
- Charbel-Raphaël 2 Jan 2024 15:36 UTC
  3 points
  1
  Parent
  I don’t think it makes sense to “revert to a uniform prior” over {doom, not doom} here.
  I’m not using a uniform prior; the ⁵⁰⁄₅₀ thing is just me expressing my views, all things considered.
  I’m using a decomposition of the type:
  - Does it want to harm us? Yes, because of misuses, ChaosGPT, wars, psychopaths firing in schools, etc...
  - Can it harm us? This is really hard to tell.
  I strongly disagree that AGI is “more dangerous” than nukes; I think this equivocates over different meanings of the term “dangerous,” and in general is a pretty unhelpful comparison.
  Okay. Let’s be more precise: “An AGI that has the power to launch nukes is at least more powerful than nukes.” Okay, and now, how would AGI acquire this power? That doesn’t seem that hard in the present world. You can bribe/threaten leaders, use drones to kill a leader during a public visit, and then help someone to gain power and become your puppet during the period of confusion à la conquistadors. The game of thrones is complex and brittle; this list of coups is rather long, and the default for a civilization/family reigning in some kingdom is to be overthrown.
  I prefer to stick fairly close to the probabilities I get from induction over human history, which tell me p(doom from unilateral action) << 50%
  I don’t like the word “doom”. I prefer to use the expression ‘irreversibly messed up future’, inspired by Christiano’s framing (and because of anthropic arguments, it’s meaningless to look at past doom events to compute this proba).
  I’m really not sure what should be the reference class here. Yes, you are still living and the human civilization is still here but:
  - Napoleon and Hitler are examples of unilateral actions that led to international wars.
  - If you go from unilateral action to multilateral actions, and you allow stuff like collusion, things become easier. And collusion is not that wild, we already see this in Cicero: the AI playing as France, conspired with Germany to trick England.
    As the saying goes: “AI is a wonderful tool for the betterment of humanity; AGI is a potential successor species.” So maybe the reference class is more something like chimps, neanderthals or horses. Another reference class could be something like Slave rebellion.
  I find foom pretty ludicrous, and I don’t see a reason to privilege the hypothesis much.
  We don’t need the strict MIRI-like RSI foom to get in trouble. I’m saying if AI technology does not have the time to percolate in the economy, we won’t have the time to upgrade our infrastructure and add much more defense than what we have today, which seems to be the default.
  - Nora Belrose 2 Jan 2024 23:22 UTC
    2 points
    1
    Parent
    
    because of anthropic arguments, it’s meaningless to look at past doom events to compute this proba
    
    I disagree; anthropics is pretty normal (https://www.lesswrong.com/posts/uAqs5Q3aGEen3nKeX/anthropics-is-pretty-normal)