AI as a resolution to the Fermi Paradox.

The Fermi paradox has been discussed here a lot, and many times it has been argued that AI cannot be the great filter, because we would observe the paperclipping of the AI just as much as we would observe alien civilizations. I don’t think we should totally rule this out though.

It may very well be the case that most unfriendly AI are unstable in various ways. For instance imagine an AI that has a utility function that changes whenever it looks at it. Or an AI that can’t resolve the ontological crisis and so fails when it learns more about the world. Or maybe an AI that has a utility function that contradicts itself. There seem to be lots of ways that an AI can have bugs other than simply having goals that aren’t aligned with our values.

Of course most of these AI would simply crash, or flop around and not do anything. A small subset of them might foom and stabilize as it does so. Most AI developers would try to move their AI from the former to the latter, and in doing so may pass through a space of AI that can foom to a significant degree without totally stabilizing. Such an AI might become very powerful, but exhibit “insane” behaviors that cause it to destroy itself and its parent civilization.

It might seem unlikely that an “insane” AI could manage to foom, but remember that we ourselves are examples of systems that can use general reasoning to gain power while still having serious flaws.

This would prevent us from observing either alien civilizations or paperclipping, and is appealing as a solution to the Fermi paradox because any advancing civilization would likely begin developing AI. Other threats that could arise after the emergence of civilization probably require the civilization to exhibit behaviors that not all civilizations would. Just because we threatened each other with nuclear annihilation doesn’t mean all civilizations would, and it only takes one exception. But AI development is a natural step in the path of progress and very tricky. No matter how a civilization behaves, it could still get AI wrong.

If this does work as a solution, it would imply that friendliness is super hard. Most of the destroyed civilizations probably thought they had it figured out when they first flipped the switch.