Dakara comments on How can we prevent AGI value drift?

Dakara 24 Nov 2024 8:12 UTC
3 points
0
What’s most worrying is the fact that in your post If we solve alignment, do we die anyway? you mentioned your worries about multipolar scenarios. However, I am not sure we’d be much better off in a unipolar scenario, though. If there is one group of people controlling AGI, then it might be actually even harder to make them give it up. They’d have a large amount of power and no real threat to it (no multipolar AGIs threatening to launch an attack).

However, I am not well-versed in literature on this topic, so if there is any plan for how we can safeguard ourself in such a scenario (unipolar AGI control), then I’d be very very happy to learn about it.
- Seth Herd 26 Nov 2024 4:16 UTC
  4 points
  2
  Parent
  I think that’s a pretty reasonable worry. And a lot of people share it. Here’s my brief take.
  Fear of centralized power vs. fear of misaligned AGI: Vitalik Buterin on 80,000 Hours
  I’m less worried about that because it seems like one questionable group with tons of power is way better than a bunch of questionable groups with tons of power—if the offense-defense balance tilts toward offense, which I think it does. The more groups, the more chance that someone uses it for ill.
  Here’s one update on my thinking: mutually assured destruction will still work for most of the world. ICBMs with nuclear payloads will be obsoleted at some point, but AGIs will also likely be told to find even better/worse ways to destroy stuff. So possibly everyone with an AGI will go ahead and hold the whole earth hostage, just so whoever starts a war doesn’t get to keep any of their stuff they were keeping on the planet. That makes the incentive to get off planet and possibly keep going.
  It’s really hard to see how this stuff plays out, but I suspect it will be obvious what the constraints and incentives and distribution of psychologies was in retrospect. So I appreciate your help in thinking through it. We don’t have answers yet, but they may be out there.
  I don’t think it would be much harder for a group to give it up if they were the only ones who had it. And maybe there’s not much difference between a full renunciation of control and just saying “oh fine, I’m tired of running the world, do whatever it seems like everybody wants but check major changes with me in case I decide to throw my weight around instead of hanging out in the land of infinite fun”.
  - Dakara 26 Nov 2024 20:27 UTC
    1 point
    0
    Parent
    After reading your comment I do agree that unipolar AGI scenario is probably better than a multipolar plan. Perhaps I underestimated how offense-favored our world is.
    
    With that aside, your plan is possibly one of the clearest, most intuitive alignment plans that I’ve seen. All of the steps make sense and seem decently likely to happen, except maybe for one. I am not sure that your argument for why we have good odds for getting AGI into trustworthy hands works.
    
    “It seems as though we’ve got a decent chance of getting that AGI into a trustworthy-enough power structure, although this podcast shifted my thinking and lowered my odds of that happening.
    
    Half of the world, and the half that’s ahead in the AGI race right now, has been doing very well with centralized power for the last couple of centuries.”
    
    I think that actually, the half with the most centralized power is doing really poorly, that’s the half of the world, which still has corrupt dictatorships and juntas. I actually think that the West has, relatively speaking, a pretty decentralized system. In order to do any important action, your proposal has to pass through multiple stages of verification and approval. It is often enough for a bill to fail one of these stages to not get passed.
    
    Furthermore, another potential problem that I see is that even in democracies, we still manage to elect selfish, corrupt and power-hungry individuals, when the entire goal of the election system is do optimize for opposite qualities. I am not sure how we will be able to overcome that hurdle.
    
    But I suspect I might’ve misunderstood your argument and if that’s the case, or if you have some other reasons for thinking that we can get AGI into safe hands (and prevent the “totalitarian misuse” scenario) then I’d be more than happy to learn about them. I think this is the biggest bottleneck of the entire plan and removing it would be really valuable.
    - Seth Herd 27 Nov 2024 5:02 UTC
      6 points
      2
      Parent
      I wish the odds for getting AGI into trustworthy hands were better. The source of my optimism is the hope that those hands just need to be decent—to have what I’ve conceptualized as a positive empathy—sadism balance. That’s anyone who’s not a total sociopath (lacking empathy and tending toward vengeance and competition) and/or sadist. I hope that about 90-99% of humanity would eventually make the world vastly better with their AGI, just because it’s trivially easy for them to do, so it only requires the smallest bit of goodwill.
      
      I wish I were more certain of that. I’ve tried to look a little at some historical examples of rulers born into power and with little risk of losing it. A disturbing number of them were quite callous rulers. They were usually surrounded by a group of advisors that got them to ignore the plight of the masses and focus on the concerns of an elite few. But this situation isn’t analogous—once your AGI hits superintelligence, it would be trivially easy to both help the masses in profound ways, and pursue whatever crazy schemes you and your friends have come up with. Thus my limited optimism.
      
      WRT the distributed power structure of Western governments: I think AGI would be placed under executive authority, like the armed forces, and the US president and those with similar roles in other countries would hold near-total power, should they choose to use it. They could transform democracies into dictatorships with ease. And we very much do continue to elect selfish and power-hungry individuals, some of whom probably actually have a negative empathy-sadism balance.
      
      Looking back, I note that you said I argued for “good odds” while I said “decent odds”. We may be in agreement on the odds.
      
      But there’s more to consider here. Thanks again for engaging; I’d like to get more discussion of this topic going. I doubt you or I are seeing all of the factors that will be obvious in retrospect yet.