If AI has sense enough to avoid making any convincingly threatening moves, it won’t be possible to convince people to essentially disrupt the whole economy in an attempt to exterminate it (even better, nobody notices at all).
Could you elaborate on “even better, nobody notices at all”. Any AI capable of efficient self-modification must be able to grasp its own workings and make predictions about improvements to various algorithms and its overall decision procedure. If an AI can do that, why would the humans who build it be unable to notice any malicious intentions? Why wouldn’t the humans who created it not be able to use the same algorithms that the AI uses to predict what it will do? If humans are unable to predict what the AI will do, how is the AI able to predict what improved versions of itself will do?
In other words, could you elaborate on why you believe that what the AI is going to do will be opaque to its creators but predictable to its initial self?
I am also rather confused about how an AI is believed to be able to hide its attempts to build molecular nanotechnology. It doesn’t seem very inconspicuous to me.
If AI creates some relatively simple free-ranging backup viruses that re-assemble a working AI whenever they can (e.g. forming something like a decentralized p2p network that redundantly stores its data when AI can’t run), even shutting down all instances of AI in the world won’t cure the infection, it’ll come back whenever you restore the Internet or even local networks, letting any previously infected computers in.
If you assume a world/future in possession of vastly more advanced technology than our current world, then I don’t disagree with you. If it takes very long for the first GAI to be created and if it is then created by means of a single breakthrough that somehow combines all previous discoveries and expert systems into a much more powerful single entity, with huge amounts of hard-coded knowledge, a complex utility-function and various dangerous drives, then I agree. It wouldn’t even take the strong version of recursive self-improvement to pretty much take over the world under those assumptions.
If an AI can do that, why would the humans who build it be unable to notice any malicious intentions?
I meant not noticing that it escaped to the Internet. But “noticing malicious intentions” is a rather strange thing to say. You notice behavior, not intentions. It’s stupid to signal your true intentions if you’ll be condemned for them.
Why wouldn’t the humans who created it not be able to use the same algorithms that the AI uses to predict what it will do?
What what will do, predict in what sense to what end? AI in the wild acts depending on what it encounters, all instances are unique (and beware of the watchers).
In other words, could you elaborate on why you believe that what the AI is going to do will be opaque to its creators but predictable to its initial self?
I didn’t talk of this.
If it takes very long for the first GAI to be created and if it is then created by means of a single breakthrough that somehow combines all previous discoveries and expert systems into a much more powerful single entity, with huge amounts of hard-coded knowledge, a complex utility-function and various dangerous drives, then I agree.
I don’t see how those assumptions are relevant. Also, all drives are dangerous, to the extent their combination differs from ours. Utility is not temper or personality or tendency to act in a certain way. Utility is what shapes long-term plans, any of whose elements might have arbitrary appearance, as necessary to dominate the circumstances.
In other words, could you elaborate on why you believe that what the AI is going to do will be opaque to its creators but predictable to its initial self?
I didn’t talk of this.
Maybe I misunderstood you. But I still believe that it is an important question.
To be able to self-improve efficiently an AI has to make some sort of predictions on how modifications will affect its behavior. The desired solution is actually much stronger than that. The AI will have to prove the friendliness of its modified self, respectively its successor, with respect to its utility-function.
The question is, if the AI can make such predictions about the behavior of improved versions of itself, why wouldn’t humans be able to do the same?
The fear is that an AI will do something that eventually leads to the extinction of all human value. But the AI must have the same fear about improved versions of itself. The AI must fear that its successor will cause the demise of what it values. Therefore it has to be able to make sure that this won’t happen. But why wouldn’t humans not be able to do the same?
An AI is not a black box to itself. It won’t be a black box to its creators. Inventing molecular nanotechnology and taking over the world in its spare time seems like something that should be noticeable.
What if the AI makes mistakes? Meaning, it mistakenly believes the successor it has just wrote has the same utility function? The same way a human could mistakenly believe the AI he has just build is friendly? In the same vein, what if the AI cannot accurately assess its own utility function, but go on optimizing anyway?
Such a badly done AI may automatically flatline, and not be able to improve itself. I don’t know. But even if the AI is friendly to itself, we humans could still botch the utility function (even if that utility function is as meta as CEV).
Could you elaborate on “even better, nobody notices at all”. Any AI capable of efficient self-modification must be able to grasp its own workings and make predictions about improvements to various algorithms and its overall decision procedure. If an AI can do that, why would the humans who build it be unable to notice any malicious intentions? Why wouldn’t the humans who created it not be able to use the same algorithms that the AI uses to predict what it will do? If humans are unable to predict what the AI will do, how is the AI able to predict what improved versions of itself will do?
In other words, could you elaborate on why you believe that what the AI is going to do will be opaque to its creators but predictable to its initial self?
I am also rather confused about how an AI is believed to be able to hide its attempts to build molecular nanotechnology. It doesn’t seem very inconspicuous to me.
If you assume a world/future in possession of vastly more advanced technology than our current world, then I don’t disagree with you. If it takes very long for the first GAI to be created and if it is then created by means of a single breakthrough that somehow combines all previous discoveries and expert systems into a much more powerful single entity, with huge amounts of hard-coded knowledge, a complex utility-function and various dangerous drives, then I agree. It wouldn’t even take the strong version of recursive self-improvement to pretty much take over the world under those assumptions.
I meant not noticing that it escaped to the Internet. But “noticing malicious intentions” is a rather strange thing to say. You notice behavior, not intentions. It’s stupid to signal your true intentions if you’ll be condemned for them.
What what will do, predict in what sense to what end? AI in the wild acts depending on what it encounters, all instances are unique (and beware of the watchers).
I didn’t talk of this.
I don’t see how those assumptions are relevant. Also, all drives are dangerous, to the extent their combination differs from ours. Utility is not temper or personality or tendency to act in a certain way. Utility is what shapes long-term plans, any of whose elements might have arbitrary appearance, as necessary to dominate the circumstances.
Maybe I misunderstood you. But I still believe that it is an important question.
To be able to self-improve efficiently an AI has to make some sort of predictions on how modifications will affect its behavior. The desired solution is actually much stronger than that. The AI will have to prove the friendliness of its modified self, respectively its successor, with respect to its utility-function.
The question is, if the AI can make such predictions about the behavior of improved versions of itself, why wouldn’t humans be able to do the same?
The fear is that an AI will do something that eventually leads to the extinction of all human value. But the AI must have the same fear about improved versions of itself. The AI must fear that its successor will cause the demise of what it values. Therefore it has to be able to make sure that this won’t happen. But why wouldn’t humans not be able to do the same?
An AI is not a black box to itself. It won’t be a black box to its creators. Inventing molecular nanotechnology and taking over the world in its spare time seems like something that should be noticeable.
What if the AI makes mistakes? Meaning, it mistakenly believes the successor it has just wrote has the same utility function? The same way a human could mistakenly believe the AI he has just build is friendly? In the same vein, what if the AI cannot accurately assess its own utility function, but go on optimizing anyway?
Such a badly done AI may automatically flatline, and not be able to improve itself. I don’t know. But even if the AI is friendly to itself, we humans could still botch the utility function (even if that utility function is as meta as CEV).