[Question] Will an Overconfident AGI Mistakenly Expect to Conquer the World?

Link post

I’m wondering how selection effects will influence the first serious attempt by an AGI to take over the world.

My question here is inspired by thoughts about people who say AGI couldn’t conquer the world because it will depend on humans to provide electricity, semiconductors, etc.

For the purposes of this post, I’m assuming that AI becomes mostly smarter than any single human via advances that are comparable to the kind of innovations that have driven AI over the past decade.

I’m assuming that capabilities are advancing at roughly the same pace as they advanced over the past decade. Let’s say no more than a 5x speedup. So nothing that I’d classify as foom. Foom is not imminent.

I’m assuming AIs will not be much more risk-averse than are humans. I’m unsure whether developers will have much control over this, and whether they’ll want AIs to be risk-averse.

I expect this world to have at least as much variety of AIs near the cutting edge as we have today. So it will at least weakly qualify as a multipolar scenario.

This scenario seems to imply a wide variation among leading AIs as to how wisely they evaluate their abilities.

Which suggests that before there’s an AGI that is capable of taking over the world, there will be an AGI which mistakenly believes it can take over the world. Given mildly pessimistic assumptions about the goals of the leading AGIs, this AGI will attempt to conquer the world before one of the others does.

I’m imagining the AGI is comparable to a child with an IQ of 200, and unusual access to resources.

This AGI would be better than, say, the US government or Google at tasks where it can get decent feedback through experience, such as manipulating public opinion.

Tasks such as fighting wars seem likely to be harder for it to handle, since they require having causal models that are hard to test. The first AGIs seem likely to be somewhat weak at creating these causal models compared to other IQ-like capabilities.

So I imagine one or two AGIs might end up being influenced by evidence that’s less direct, such as Eliezer’s claims about about the ease with which a superintelligence could create Drexlerian nanotech.

I can imagine a wide range of outcomes, in which either humans shut down the AGI, or the AGI dies due to inability to maintain its technology:

  • a fire alarm

  • progress is set back by a century

  • enough humans die that some other primate species ends up building the next AGI

  • Earth becomes lifeless

Some of you will likely object that an AGI will be wise enough to be better calibrated about its abilities than a human. That will undoubtedly become true if the AGI takes enough time to mature before taking decisive action. But a multipolar scenario can easily pressure an AGI to act before other AGIs with different values engage in similar attempts to conquer the world.

We’ve got plenty of evidence from humans that being well calibrated about predictions is not highly correlated with capabilities. I expect AGIs to be well calibrated on predictions that can be easily tested. But I don’t see what would make an early-stage AGI well calibrated about novel interactions with humans.

How much attention should we pay to this scenario?

My gut reaction is that given my (moderately likely?) assumptions about take-off speeds and extent of multipolarity, something like this has over a 10% chance of happening. Am I missing anything important?