Orthogonality Thesis: how “smart” an AI is has (almost) no relationship to what it’s goals are. It might seem stupid to a human to want to maximize the number of paperclips in the universe, but there’s nothing “in principle” that prevents an AI from being superhumanly good at achieving goals in the real world while still having a goal that people would think is as stupid and pointless as turning the universe into paperclips.
Instrumental Convergence: there are some things that are very useful for achieving almost any goal in the real world, so most possible AIs that are good at achieving things in the real world would try to do them. For example, self-preservation: it’s a lot harder to achieve a goal if you’re turned off, blown up, or if you stop trying to achieve it because you let people reprogram you and change what your goals are. “Aquire power and resources” is another such goal. As Eliezer has said, “the AI does not love you, nor does it hate you, but you are made from atoms it can use for something else.”
Complexity of Value: human values are complicated, and messing up one small aspect can result in a universe that’s stupid and pointless. One of the oldest SF dystopias ends with robots designed “to serve and obey and guard men from harm” taking away almost all human freedom (for their own safety) and taking over every task humans used to do, leaving people with nothing to do except sit “with folded hands.” (Oh, and humans who resist are given brain surgery to make them stop wanting to resist.) An AI that’s really good at achieving arbitrary real-world goals is like a literal genie: prone to giving you exactly what you asked for and exactly what you didn’t want.
Right now, current machine learning methods are completely incapable of addressing any of these problems, and they actually do tend to produce “perverse” solutions to problems we give them. If we used them to make an AI that was superhumanly good at achieving arbitrary goals in the real world, we wouldn’t be able to reliably give it a goal of our choice, we wouldn’t be able to tell what goal it actually ends up with if we do try to give it a goal, and even if we could make sure that the goal we intend to give it and the goal it learns are exactly the same, we still couldn’t be sure that any (potentially useful) goal we specify wouldn’t also result in the end of the world as an unfortunate side effect.
The point isn’t that I’m unaware of the orthogonality thesis, it’s that Yudkowsky doesn’t present it in his recent popular articles and podcast appearances[0]. So, he asserts that the creation of superhuman AGI will almost certainly lead to human extinction (until massive amounts of alignment research has been successfully carried out), but he doesn’t present an argument for why that is the case. Why doesn’t he? Is it because he thinks normies cannot comprehend the argument? Is this not a black pill? IIRC he did assert that superhuman AGI would likely decide to use our atoms on the Bankless podcast, but he didn’t present a convincing argument in favour of that position.
Yeah, the letter on Time Magazine’s website doesn’t argue very hard that superintelligent AI would want to kill everyone, only that it could kill everyone—and what it would actually take to implement “then don’t make one”.
To be clear, that it more-likely-than-not would want to kill everyone is the article’s central assertion. “[Most likely] literally everyone on Earth will die” is the key point. Yes, he doesn’t present a convincing argument for it, and that is my point.
The basic claims that lead to that conclusion are
Orthogonality Thesis: how “smart” an AI is has (almost) no relationship to what it’s goals are. It might seem stupid to a human to want to maximize the number of paperclips in the universe, but there’s nothing “in principle” that prevents an AI from being superhumanly good at achieving goals in the real world while still having a goal that people would think is as stupid and pointless as turning the universe into paperclips.
Instrumental Convergence: there are some things that are very useful for achieving almost any goal in the real world, so most possible AIs that are good at achieving things in the real world would try to do them. For example, self-preservation: it’s a lot harder to achieve a goal if you’re turned off, blown up, or if you stop trying to achieve it because you let people reprogram you and change what your goals are. “Aquire power and resources” is another such goal. As Eliezer has said, “the AI does not love you, nor does it hate you, but you are made from atoms it can use for something else.”
Complexity of Value: human values are complicated, and messing up one small aspect can result in a universe that’s stupid and pointless. One of the oldest SF dystopias ends with robots designed “to serve and obey and guard men from harm” taking away almost all human freedom (for their own safety) and taking over every task humans used to do, leaving people with nothing to do except sit “with folded hands.” (Oh, and humans who resist are given brain surgery to make them stop wanting to resist.) An AI that’s really good at achieving arbitrary real-world goals is like a literal genie: prone to giving you exactly what you asked for and exactly what you didn’t want.
Right now, current machine learning methods are completely incapable of addressing any of these problems, and they actually do tend to produce “perverse” solutions to problems we give them. If we used them to make an AI that was superhumanly good at achieving arbitrary goals in the real world, we wouldn’t be able to reliably give it a goal of our choice, we wouldn’t be able to tell what goal it actually ends up with if we do try to give it a goal, and even if we could make sure that the goal we intend to give it and the goal it learns are exactly the same, we still couldn’t be sure that any (potentially useful) goal we specify wouldn’t also result in the end of the world as an unfortunate side effect.
The point isn’t that I’m unaware of the orthogonality thesis, it’s that Yudkowsky doesn’t present it in his recent popular articles and podcast appearances[0]. So, he asserts that the creation of superhuman AGI will almost certainly lead to human extinction (until massive amounts of alignment research has been successfully carried out), but he doesn’t present an argument for why that is the case. Why doesn’t he? Is it because he thinks normies cannot comprehend the argument? Is this not a black pill? IIRC he did assert that superhuman AGI would likely decide to use our atoms on the Bankless podcast, but he didn’t present a convincing argument in favour of that position.
[0] see the following: https://time.com/6266923/ai-eliezer-yudkowsky-open-letter-not-enough/ ,
,
Yeah, the letter on Time Magazine’s website doesn’t argue very hard that superintelligent AI would want to kill everyone, only that it could kill everyone—and what it would actually take to implement “then don’t make one”.
To be clear, that it more-likely-than-not would want to kill everyone is the article’s central assertion. “[Most likely] literally everyone on Earth will die” is the key point. Yes, he doesn’t present a convincing argument for it, and that is my point.