Seth Herd comments on Orienting to 3 year AGI timelines

Seth Herd 24 Dec 2024 16:43 UTC
5 points
1
That is very likely what “safe” means. Instruction-following AGI is easier and more likely than value aligned AGI. It seems very likely to be the default alignment goal as soon as someone thinks seriously about what they want their AGI aligned to.

As for whether it’s actually good for most people: it depends entirely on who in the NSA controls it. There are very probably both good (ethically typical) and bad (sociopathic/sadistic) people there.

I have a whole draft speculating on which people could be trusted to control the world by controlling an AGI as it becomes ASI; I think it’s between 90 and 99% of people who have a “positive empathy-sadism balance”. But I’m not at all sure; it depends on who they’re surrounded by and the circumstances. Being in conflict with other AGI wielders gives lots more room for negative emotions to dominate. And it could be bad for most people even if it’s good in the much longer run.
- rvnnt 27 Dec 2024 13:11 UTC
  1 point
  0
  Parent
  I’d be interested to see that draft as a post!
  
  What fraction of humans in set X would you guess have a “positive empathy-sadism balance”, for
  - X = all of humanity?
  - X = people in control of (governmental) AGI projects?
  I agree that the social environment / circumstances could have a large effect on whether someone ends up wielding power selfishly or benevolently. I wonder if there’s any way anyone concerned about x/s-risks could meaningfully affect those conditions.
  
  I’m guessing^[1] I’m quite a bit more pessimistic than you about what fraction of humans would produce good outcomes if they controlled the world.
  ↩︎
  with a lot of uncertainty, due to ignorance of your models.
  - Seth Herd 3 Jan 2025 1:35 UTC
    6 points
    −1
    Parent
    Yep, the concern that the more sociopathic people wind up in positions of power is the big concern. However, I don’t think power is correlated with sadism and hopefully it’s anticorrelated.
    
    I’d guess 99% of humanity and like 95% of people in control of AGI projects. Maybe similar for those high in the US government—but not in dictatorships where I think sadism and sociopathy win.
    
    I didn’t finish that post because I was becoming more uncertain while writing it. A lot of hereditary monarchs have been pretty good rulers (this seems like the closest historical analogy to having AGI-level power over the world). But a lot were really bad rulers, too. That seemed to happen when a social group around them just didn’t care about the commoners and got the monarchs interested in their own status games. That could happen with some who controlled an AGI. I guess they’re guaranteed to be less naive than hereditary monarchs since all the candidates are adults who’ve earned power. Hopefully that would make them more likely to at least occasionally consider the lot of the commoner.
    
    One of the things that gave me some optimism was considering the long term. A lot of people are selfish and competitive now. But getting absolute control would over time make them less competitive. And it would be so easy to benefit humanity, just by telling your slave AGI to go make it happen. A lot of people would enjoy being hailed as a benevolent hero who’s shepherded humanity into a new golden age.
    
    Anyway, I’m not sure.
    - rvnnt 3 Jan 2025 15:33 UTC
      1 point
      0
      Parent
      Thanks for the answer. It’s nice to get data about how other people think about this subject.
      
      the concern that the more sociopathic people wind up in positions of power is the big concern.
      
      Agreed!
      
      Do I understand correctly: You’d guess that
      
      99% of humans have a “positive empathy-sadism balance”,
      and of those, 90-99% could be trusted to control the world (via controlling ASI),
      i.e., ~89-98% of humanity could be trusted to control the world with ASI-grade power?
      
      If so, then I’m curious—and somewhat bewildered! -- as to how you arrived at those guesses/numbers.
      
      I’m under the impression that narcissism and sadism have prevalences of very roughly 6% and 4%, respectively. See e.g. this post, or the studies cited therein. Additionally, probably something like 1% to 10% of people are psychopaths, depending on what criteria are used to define “psychopathy”. Even assuming there’s a lot of overlap, I think a reasonable guess would be that ~8% of humans have at least one of those traits. (Or 10%, if we include psychopathy.)
      
      I’m guessing you disagree with those statistics? If yes, what other evidence leads you to your different (much lower) estimates?
      
      Do you believe that someone with (sub-)clinical narcissism, if given the keys to the universe, would bring about good outcomes for all (with probability >90%)? Why/how? What about psychopaths?
      
      Do you completely disagree with the aphorism that “power corrupts, and absolute power corrupts absolutely”?
      
      Do you think that having good intentions (and +0 to +3 SD intelligence) is probably enough for someone to produce good outcomes, if they’re given ASI-grade power?
      
      FWIW, my guesstimates are that
      
      over 50% of genpop would become corrupted by ASI-grade power, or are sadistic/narcissistic/psychopathic/spiteful to begin with,
      of the remainder, >50% would fuck things up astronomically, despite their good intentions^[1],
      genetic traits like psychopathy and narcissism (not sure about sadism), and acquired traits like cynicism, are much more prevalent (~5x odds?) in people who will end up in charge of AGI projects, relative to genpop. OTOH, competence at not-going-insane is likely higher among them too.
      
      it would be so easy to benefit humanity, just by telling your slave AGI to go make it happen. A lot of people would enjoy being hailed as a benevolent hero
      
      I note that if someone is using an AGI as a slave, and is motivated by wanting prestige status, then I do not expect that to end well for anyone else. (Someone with moderate power, e.g. a medieval king, with the drive to be hailed a benevolent hero, might indeed do great things for other people. But someone with more extreme power—like ASI-grade power—could just… rewire everyone’s brains; or create worlds full of suffering wretches, for him to save and be hailed/adored by; or… you get the idea.)
      
      ↩︎
      Even relatively trivial things like social media or drugs mess lots of humans up; and things like “ability to make arbitrary modifications to your mind” or “ability to do anything you want, to anyone, with complete impunity” are even further OOD, and open up even more powerful superstimuli/reward-system hacks. Aside from tempting/corrupting humans to become selfish, I think that kind of situation has high potential to just lead to them going insane or breaking (e.g. start wireheading) in any number of ways.
      
      And then there are other failure modes, like insufficient moral uncertainty and locking in some parochial choice of values, or a set of values that made sense in some baseline human context but which generalize to something horrible. (“Obviously we should fill the universe with Democracy/Christianity/Islam/Hedonism/whatever!”, … “Oops, turns out Yahweh is pretty horrible, actually!”)