jbash comments on “Pivotal Act” Intentions: Negative Consequences and Fallacious Arguments

jbash 20 Apr 2022 11:43 UTC
6 points
−1
In that unlikely event that somebody doesn’t instantly give it a buggy order that accidentally destroys the world (like “get me some paperclips” or something), we arrive at the second paragraph of my (1).

If something that powerful is going to take human orders, there will be a power struggle over control of it, either before or after it’s turned on. As a result, it will end up doing the bidding of whoever is willing and able to seize power.

The whole scenario is set up so that the first “defector” takes the entire game. And it’s not an iterated game. Any approach that excludes taking that into account relies on there being absolutely no defectors, which is completely crazy in any scenario with a meaningful number of players.

And even if I’m wrong about that, we still lose.

Look at the suggestion of going to a government, or worse some kind of international organization, and saying “We have this ultra-powerful AGI, and we need you to use your legitimate processes to tell us what to do with it. We recommend using it to prevent other more dangerous AGIs from being used”. Your government may actually even do that, if you’re incredibly lucky and it manages to make any decision at all soon enough to be useful.

Do you think it’s going to stop there? That government (actually better modeled as specific people who happen to maneuver into the right places with in that government) is going to keep giving other orders to that AGI. Eventually the process that selects the orders and/or the people giving them will fail, and the orders will go beyond “limited and concrete world-saving tasks” into something catastrophic. Probably sooner than later. And a coalition of AI companies wouldn’t be any better at resisting that than a government.

Human control of an AI leads to X-risk-realization, or more likely S-risk-realization, the first time somebody unwise happens to get their hands on the control panel.