DirectedEvolution comments on Superintelligent Introspection: A Counter-argument to the Orthogonality Thesis

DirectedEvolution 1 Sep 2021 21:17 UTC
6 points
Cheers! Yes, you hit the nail on the head here. This was one of my mistakes in the post. A related one was that I thought of goals and intelligence as needing to be two separate devices, in order to allow for unlimited combinations of them. However, intelligence can be the “device” on which the goals are “running:” intelligence is responsible for remembering goals, and for evaluating, and predicting goal-oriented behavior. And we could see the same level of intelligence develop with a wide variety of goals, as different programs can run on the same operating system.
One other flaw in my thinking was that I conceived of goals as being something legiblly pre-determined, like “maximizing paperclips.” It seems likely that a company could create a superintelligent AI and try to “inject” it with a goal like that. However, the AI might very well evolve to have its own “terminal goal,” perhaps influenced but not fully determined by the human-injected goal. The best way to look at it is actually in reverse: whatever the AI tries to protect and pursue above all else is its terminal goal. The AI safety project is the attempt to gain some ability to predict and control this goal and/or the AI’s ability to pursue it.
The point of the orthogonality thesis, I now understand, is just to say that we shouldn’t rule anything out, and admit we’re not smart enough to know what will happen. We don’t know for sure if we can build a superintelligent AI, or how smart it would be. We don’t know how much control or knowledge of it we would have. And if we weren’t able to predict and control its behavior, we don’t know what goals it would develop or pursue independently of us. We don’t know if it would show goal-oriented behavior at all. But if it did show unconstrained and independent terminal goal-oriented behavior, and it was sufficiently intelligent, then we can predict that it would try to enhance and protect those terminal goals (which are tautologically defined as whatever it’s trying to enhance and protect). And some of those scenarios might represent extreme destruction.
Why don’t we have the same apocalyptic fears about other dangers? Because nothing else has a plausible story for how it could rapidly self-enhance, while also showing agentic goal-oriented behavior. So although we can spin horror stories about many technologies, we should treat superintelligent AI as having a vastly greater downside potential than anything else. It’s not just “we don’t know.” It’s not just “it could be bad.” It’s that it has a unique and plausible pathway to be categorically worse (by systematically eliminating all life) than any other modern technology. And the incentives and goals of most humans and institutions are not aligned to take a threat of that kind with nearly the seriousness that it deserves.
And none of this is to say that we know with any kind of clarity what should be done. It seems unlikely to me, but it’s possible that the status quo is somehow magically the best way to deal with this problem. We need an entirely separate line of reasoning to figure out how to solve this problem, and to rule out ineffective approaches.