if we make General Intelligences with short term goals perhaps we don’t need to fear AI apocalypse
One of the hypothetical problems with opaque superintelligence is that it may combine unexpected interpretations of concepts, with extraordinary power to act upon the world, with the result that even a simple short-term request results in something dramatic and unwanted.
Suppose you say to such an AI, “What is 1+1?” You think its task is to display on the screen, decimal digits representing the number that is the answer to that question. But what does it think its task is? Suppose it decides that its task is to absolutely ensure that you know the right answer to that question. You might end up in the Matrix, perpetually reliving the first moment that you learned about addition.
So we not only need to worry about AI appropriating all resources for the sake of long-term goals. We also need to anticipate and prevent all the ways it might destructively overthink even a short-term goal.
I am using Wikipedia’s definition: “Ensuring that emergent goals match the specified goals for the system is known as inner alignment.”
Inner alignment is definitely a problem. In the case you described, the emergent goal was long term (ensure I remember the answer to 1+1), and I remain wondering whether by default short term specified goals do or do not lead to strange long term goals like in your example.
One of the hypothetical problems with opaque superintelligence is that it may combine unexpected interpretations of concepts, with extraordinary power to act upon the world, with the result that even a simple short-term request results in something dramatic and unwanted.
Suppose you say to such an AI, “What is 1+1?” You think its task is to display on the screen, decimal digits representing the number that is the answer to that question. But what does it think its task is? Suppose it decides that its task is to absolutely ensure that you know the right answer to that question. You might end up in the Matrix, perpetually reliving the first moment that you learned about addition.
So we not only need to worry about AI appropriating all resources for the sake of long-term goals. We also need to anticipate and prevent all the ways it might destructively overthink even a short-term goal.
I am using Wikipedia’s definition: “Ensuring that emergent goals match the specified goals for the system is known as inner alignment.”
Inner alignment is definitely a problem. In the case you described, the emergent goal was long term (ensure I remember the answer to 1+1), and I remain wondering whether by default short term specified goals do or do not lead to strange long term goals like in your example.