WalterL comments on AGI Ruin: A List of Lethalities

WalterL 6 Jun 2022 15:23 UTC
6 points
0
I don’t think I disagree with any of this, but I’m not incredibly confident that I understand it fully. I want to rephrase in my own words in order to verify that I actually do understand it. Please someone comment if I’m making a mistake in my paraphrasing.
1. As time goes on, the threshold of ‘what you need to control in order to wipe out all life on earth’ goes down. In the Bronze Age it was probably something like ‘the mind of every living person’. Time went on and it was something like ‘the command and control node to a major nuclear power’. Nowadays it is something like ‘a lab where viruses can be made’.
2. AI is likely to push the threshold described in ‘1’ still further, by inventing nano technology or other means that we cannot expect. (The capability of someone/something smarter than you is an unknown unknown, just as dogs can’t properly assess the danger of a human’s actions.) It would be insufficient to keep AI’s away from every virus lab, we don’t know what is orthogonal to a virus lab on the ‘can annihilate life’ axis to something smarter than us.
3. For any given goal X, ‘be the only player’ is a really compelling subgoal. Consequently, as ‘wipe out all life on earth’ becomes easier and easier, we should expect that anyone/thing not explicitly unable to do so will do so. A paperclip collector or a stock price maximizer or a hostile regime are all one and the same as far as ‘will wipe you out without compunction when the button that does so becomes available to press’.
4. Putting together 2 and 3, it is reasonable to suppose that if an AI capable of 2 exists with goals broadly described by 3 (both of which are pretty well baked into the description of ‘AI’ that most people subscribe to), it will wipe out life on earth.
Stipulating that the chain of logic above is broadly valid, we can say that ‘an AI that is motivated to destroy the world and capable of doing so grows more likely to exist every year.’
The ‘alignment problem’ is the problem of making an AI that is capable of destroying the world but does not do so. Such an AI can be described as ‘aligned’ or ‘friendly’. Creating such a thing has not yet been accomplished, and seems very difficult, basically because any AI with goals will see that ending life will be tremendously useful to its goals, and all the versions of ‘make the goals tie in with keeping life around’ or ‘put up a fence in its brain that doesn’t let it do what you don’t want’ are just dogs trying to think about how to keep humans from harming them.
You can’t regulate what you can’t understand, you can’t understand what you can’t simulate, you can’t simulate greater intelligence (because if you could do so you would have that greater intelligence).
The fact that it is currently not possible to create a Friendly AI is not the limit of our woes, because the next point is that even doing so would not protect us from some other being creating a regular garden variety AI which would annihilate us. As trend 1 above continues to progress, and omnicide as a tool comes to the hands of ever more actors, each and every one of them must refrain.
A Friendly AI would need to strike preemptively at the possibility of other AIs coming into existence, and all the variations of doing so would be unacceptable to its human partners. (Broadly speaking ‘destroy all microchips’ suffices as the socially acceptable way to phrase the enormity of this challenge). Any version of this would be much less tractable to our understanding of the capabilities of an AI than ‘synthesize a death plague’.
In the face of trend 4 above, then, our hope is gated behind two impossibilities:
A. Creating an Aligned AI is a task that is beyond our capacity, while creating an Unaliged AI is increasingly possible. We want to do the harder thing before someone does the easier.
B. Once created, the Aligned AI has a harder task than an Unaliged AI. It must abort all Unaliged AI and leave humanity alive. It is possible that the delta between these tasks will be decisive. The actions necessary for this task will slam directly into whatever miracle let A occur.
To sum up this summary: The observable trends lead to worldwide death. That is the commonplace, expected outcome of the sensory input we are receiving. In order for that not to occur, multiple implausible things have to happen in succession, which they obviously won’t.