“Program an AI to supply you with gold” doesn’t say anything concrete, and therefore implies precisely the kind of utility function I was suggesting to avoid. In my example, the AI is programmed to print text messages and to choose chess moves—those are concrete, and as long as it is limited to those it cannot gain resources or take over the world. It is true that printing text messages could have the additional goal of taking over the world, and even choosing chess moves could have malicious goals liking driving people insane. But it isn’t difficult to ensure this doesn’t happen. In the case of the chess moves, by making sure that it is optimizing on the board position alone, and something similar could be done with the text chat.
Despite not having an unlimited goal, its intelligence will be helpful by making those particular optimizations better.
“Program an AI to supply you with gold” doesn’t say anything concrete, and therefore implies precisely the kind of utility function I was suggesting to avoid.
In which case, I refer you back to my final paragraph:
Now, if you only ever give the AI narrowly defined tasks where you know exactly what it will do to carry them out, then maybe you’re safe. But if you’re doing that then why does it need to be intelligent in the first place?
You don’t need to know exactly what it will do. For example, in the chess playing case, you know that it will analyze chess positions, and pick a chess move. You don’t have to know exactly how it will do that analysis (although you do know it will analyze without gaining resources etc). The more intelligent it is, the better it will do that analysis.
Sure. But it seems to me that the very essence of what we call intelligence—of what would distinguish something we were happy to call “artificially intelligent” from, say, a very good chess-playing program—is precisely the fact of not operating solely within a narrowly defined domain like this.
Saying “Artificial general intelligence is perfectly safe: we’ll just only ever give it tasks as clearly defined and limited as playing chess” feels like saying “Nuclear weapons are perfectly safe: we’ll just make them so they can’t sustain fission or fusion reactions”.
Incidentally: in order to know that “it will analyse without gaining resources etc”, surely you do need to know pretty much exactly how it will do its analysis. Especially as “etc.” has to cover the whole panoply of ways in which a superintelligent AI might do things we don’t want it to do. So it’s not enough just to only give the AI tasks like “win this game of chess”; you have to constrain its way of thinking so that you know it isn’t doing anything you don’t completely understand. Which, I repeat, seems to me to take away all reasons for making a superintelligent AI in the first place.
I do not agree that you have to completely understand what it is doing. As long as it uses a fixed objective function that evaluates positions and outputs moves, and that function itself is derived from the game of chess and not from any premises concerned with the world, then it cannot do anything dangerous, even if you have no idea of the particulars of that function.
Also, I am not proposing that the function of an AI has to be this simple. This is a simplification to make the point easier to understand. The real point is that an AI does not have to have a goal in the sense of something like “acquiring gold”, that it should not have such a goal, and that we are capable of programming an AI in such a way as to ensure that it does not.
“Program an AI to supply you with gold” doesn’t say anything concrete, and therefore implies precisely the kind of utility function I was suggesting to avoid. In my example, the AI is programmed to print text messages and to choose chess moves—those are concrete, and as long as it is limited to those it cannot gain resources or take over the world. It is true that printing text messages could have the additional goal of taking over the world, and even choosing chess moves could have malicious goals liking driving people insane. But it isn’t difficult to ensure this doesn’t happen. In the case of the chess moves, by making sure that it is optimizing on the board position alone, and something similar could be done with the text chat.
Despite not having an unlimited goal, its intelligence will be helpful by making those particular optimizations better.
In which case, I refer you back to my final paragraph:
You don’t need to know exactly what it will do. For example, in the chess playing case, you know that it will analyze chess positions, and pick a chess move. You don’t have to know exactly how it will do that analysis (although you do know it will analyze without gaining resources etc). The more intelligent it is, the better it will do that analysis.
Sure. But it seems to me that the very essence of what we call intelligence—of what would distinguish something we were happy to call “artificially intelligent” from, say, a very good chess-playing program—is precisely the fact of not operating solely within a narrowly defined domain like this.
Saying “Artificial general intelligence is perfectly safe: we’ll just only ever give it tasks as clearly defined and limited as playing chess” feels like saying “Nuclear weapons are perfectly safe: we’ll just make them so they can’t sustain fission or fusion reactions”.
Incidentally: in order to know that “it will analyse without gaining resources etc”, surely you do need to know pretty much exactly how it will do its analysis. Especially as “etc.” has to cover the whole panoply of ways in which a superintelligent AI might do things we don’t want it to do. So it’s not enough just to only give the AI tasks like “win this game of chess”; you have to constrain its way of thinking so that you know it isn’t doing anything you don’t completely understand. Which, I repeat, seems to me to take away all reasons for making a superintelligent AI in the first place.
I do not agree that you have to completely understand what it is doing. As long as it uses a fixed objective function that evaluates positions and outputs moves, and that function itself is derived from the game of chess and not from any premises concerned with the world, then it cannot do anything dangerous, even if you have no idea of the particulars of that function.
Also, I am not proposing that the function of an AI has to be this simple. This is a simplification to make the point easier to understand. The real point is that an AI does not have to have a goal in the sense of something like “acquiring gold”, that it should not have such a goal, and that we are capable of programming an AI in such a way as to ensure that it does not.