Seth Herd comments on If we solve alignment, do we die anyway?

Seth Herd 28 Aug 2024 18:37 UTC
3 points
0
Great question. I was thinking of adding an edit to the end of the post with conclusions based on the comments/discussion. Here’s a draft:

None of the suggestions in the comments seemed to me like workable ways to solve the problem.

I think we could survive an n-way multipolar scenario if n is small—like a handful of ASIs controlled by a few different governments. But not indefinitely—unless those ASIs come up with coordination strategies no human has yet thought of (or argued convincingly enough that I’ve heard of it—this isn’t really my area, but nobody has pointed to any strong possibilities in the comments).

So my conclusion was more on the side that it’s going to be so obviously such a bad/dangerous scenario that it won’t be allowed to happen.

Basically, the hope is that this all becomes viscerally obvious to the first people who speak with a superhuman AGI and who think about global politics. They’ll pull their shit together, as humans sometimes do when they’re motivated to actually solve hard problems.

Here’s one scenario in which multipolarity is stopped. Similar scenarios apply if the number of AGIs is small and people coordinate well enough to use their small group of AGIs similarly to what I’ll describe below.

The people who speak to the first AGIi(s) and realize what must be done will include people in the government, because of course they’ll be demanding to be included in decisions about using AGI. They’ll talk sense to leadership, and the government will declare that this shit is deathly dangerous, and that nobody else should be building AGI.

They’ll call for a voluntary global moratorium on AGI projects. Realizing that this will be hugely unpopular, they’ll promise that the existing AGI will be used to benefit the whole world. They’ll then immediately deploy that AGI to identify and sabotage projects in other countries. If that’s not adequate, they’ll use minimal force. False-flag operations framing anti-AGI groups might be used to destroy infrastructure and assassinate key people involved in foreign projects. Or who knows.

The promise to benefit the whole world will be halfway kept. The AGI will be used to develop military technology and production facilities for the government that controls it; but it will simultaneously be used to develop useful technologies that aid the problems most pressing for other governments. That could be useful tool AI, climate geoengineering, food production, etc.

The government controlling AGI keeps their shit together enough that no enterprising sociopath seizes personal control and anoints themselves god-emperor for eternity. They realize that this will happen eventually if their now-ASI keeps following human orders. They use its now-well-superhuman intelligence to solve value alignment sufficiently well to launch it or a successor as a fully autonomous sovereign ASI.

Humanity prospers under their sole demigod until the heat death of the universe, or an unaligned expansion AGI crosses our lightcone and turns everyone into paperclips. It will be a hell of a party for a subjectively very long time indeed. The one unbreakable rule will be that thou shalt worship no other god. All of humanity everywhere is monitored by copies of the sovereign AGI to prevent them building new AGI that aren’t certified-aligned by the servant-god ASI. But since it’s aligned and smart, it’s pretty cool about the whole thing. So nobody minds that one rule a lot, given how much fun they’re building everything and having every experience imaginable within the consent of all sentient entities involved.

I’d love to get more help thinking about how likely the central premise, that people get their shit together once they’re staring real AGI in the face is. And what we can do now to encourage that.
What links here?
- If we solve alignment, do we die anyway? by Seth Herd (23 Aug 2024 13:13 UTC; 70 points)