One argument is that the only general intelligence we know, humans, would want to improve if they could tinker with their source code. But why is it so hard to make people learn then? Why don’t we see much more people interested in how to change their mind?
Humans are (roughly) the stupidest possible general intelligences. If it were possible for even a slightly less intelligent species to have dominated the earth, they would have done so (and would now be debating AI development in a slightly less sophisticated way). We are so amazingly stupid we don’t even know what our own preferences are! We (currently) can’t improve or modify our hardware. We can modify our own software, but only to a very limited extent and within narrow constraints. Our entire cognitive architecture was built by piling barely-good-enough hacks on top of each other, with no foresight, no architecture, and no comments in the code.
And despite all that, we humans have reshaped the world to our whims, causing great devastation and wiping out many species that are only marginally dumber than we are. And no human who has ever lived has known their own utility function. That alone would make us massively more powerful optimizers; it’s a standard feature for every AI. AIs have no physical, emotional, or social needs. They do not sleep, or rest, or get bored or distracted. On current hardware, they can perform more serial operations per second than a human by a factor of 10,000,000.
An AI that gets even a little bit smarter than a human will out-optimize us, recursive self-improvement or not. It will get whatever it has been programmed to want, and it will devote every possible resource it can acquire to doing so.
But in my opinion the difference here also implies that while nothing will distract them, there will also be no incentive not to hold. Why would it do more than necessary to reach a goal?
Clippy’s cousin, Clip, is a paperclip satisficer. Clip has been programmed to create 100 paperclips. Unfortunately, the code for his utility function is approximately “ensure that there are 100 more paperclips in the universe than there were when I began running.”
Soon, our solar system is replaced with n+100 paperclips surrounded by the most sophisticated defenses Clip can devise. Probes are sent out to destroy any entity that could ever have even the slightest chance of leading to the destruction of a single paperclip.
The further argument here is that it will misunderstand its goals. But the problem I see in this case is firstly that the more unspecific the goal the less it is able to measure its self-improvement against the goal to quantify the efficiency of its output.
The Hidden Complexity of Wishes and Failed Utopia #4-2 may be worth a look. The problem isn’t a lack of specificity, because an AI without a well-defined goal function won’t function. Rather, the danger is that the goal system we specify will have unintended consequences.
Secondly, the more vague a goal the larger has to be its general knowledge, previous to any self-improvement, to make sense of it in the first place? Shouldn’t those problems outweigh each other to some extent?
Of course, if you told it to learn as much about the universe as possible, that is something completely different.
Acquiring information is useful for just about every goal. When there aren’t bigger expected marginal gains elsewhere, information gathering is better than nothing. “Learn as much about the universe as possible” is another standard feature for expected utility maximizers.
And this is all before taking into account self-improvement, utility functions that are unstable under self-modification, and our dear friend FOOM.
TL;DR:
Agents that aren’t made of meat will actually maximize utility.
Writing a utility function that actually says what you think it does is much harder than it looks.
Upvoted, thanks! Very concise and clearly put. This is so far the best scary reply I got in my opinion. It reminds me strongly of the resurrected vampires in Peter Watts novel Blindsight. They are depicted as natural human predators, a superhuman psychopathic Homo genus with minimal consciousness (more raw processing power instead) that can for example hold both aspects of a Necker cube in their heads at the same time. Humans resurrected them with a deficit that was supposed to make them controllable and dependent on their human masters. But of course that’s like a mouse trying to hold a cat as pet. I think that novel shows more than any other literature how dangerous just a little more intelligence can be. It quickly becomes clear that humans are just like little Jewish girls facing a Waffen SS squadron believing they go away if they only close their eyes.
My favorite problem with this entire thread is that it’s basically arguing that even the very first test cases will destroy us all. In reality, nobody puts in a grant application to construct an intelligent being inside a computer with the goal of creating 100 paperclips. They put in the grant to ‘dominate the stock market’, or ‘defend the nation’, or ‘cure death’. And if they don’t, then the Chinese government, who stole the code, will, or that Open Source initiative will, or the South African independent development will, because there’s enormous incentives to do so.
At best, boxing an AI with trivial, pointless tasks only delays the more dangerous versions.
Humans are (roughly) the stupidest possible general intelligences. If it were possible for even a slightly less intelligent species to have dominated the earth, they would have done so (and would now be debating AI development in a slightly less sophisticated way). We are so amazingly stupid we don’t even know what our own preferences are! We (currently) can’t improve or modify our hardware. We can modify our own software, but only to a very limited extent and within narrow constraints. Our entire cognitive architecture was built by piling barely-good-enough hacks on top of each other, with no foresight, no architecture, and no comments in the code.
And despite all that, we humans have reshaped the world to our whims, causing great devastation and wiping out many species that are only marginally dumber than we are. And no human who has ever lived has known their own utility function. That alone would make us massively more powerful optimizers; it’s a standard feature for every AI. AIs have no physical, emotional, or social needs. They do not sleep, or rest, or get bored or distracted. On current hardware, they can perform more serial operations per second than a human by a factor of 10,000,000.
An AI that gets even a little bit smarter than a human will out-optimize us, recursive self-improvement or not. It will get whatever it has been programmed to want, and it will devote every possible resource it can acquire to doing so.
Clippy’s cousin, Clip, is a paperclip satisficer. Clip has been programmed to create 100 paperclips. Unfortunately, the code for his utility function is approximately “ensure that there are 100 more paperclips in the universe than there were when I began running.”
Soon, our solar system is replaced with n+100 paperclips surrounded by the most sophisticated defenses Clip can devise. Probes are sent out to destroy any entity that could ever have even the slightest chance of leading to the destruction of a single paperclip.
The Hidden Complexity of Wishes and Failed Utopia #4-2 may be worth a look. The problem isn’t a lack of specificity, because an AI without a well-defined goal function won’t function. Rather, the danger is that the goal system we specify will have unintended consequences.
Acquiring information is useful for just about every goal. When there aren’t bigger expected marginal gains elsewhere, information gathering is better than nothing. “Learn as much about the universe as possible” is another standard feature for expected utility maximizers.
And this is all before taking into account self-improvement, utility functions that are unstable under self-modification, and our dear friend FOOM.
TL;DR:
Agents that aren’t made of meat will actually maximize utility.
Writing a utility function that actually says what you think it does is much harder than it looks.
Be afraid.
Upvoted, thanks! Very concise and clearly put. This is so far the best scary reply I got in my opinion. It reminds me strongly of the resurrected vampires in Peter Watts novel Blindsight. They are depicted as natural human predators, a superhuman psychopathic Homo genus with minimal consciousness (more raw processing power instead) that can for example hold both aspects of a Necker cube in their heads at the same time. Humans resurrected them with a deficit that was supposed to make them controllable and dependent on their human masters. But of course that’s like a mouse trying to hold a cat as pet. I think that novel shows more than any other literature how dangerous just a little more intelligence can be. It quickly becomes clear that humans are just like little Jewish girls facing a Waffen SS squadron believing they go away if they only close their eyes.
My favorite problem with this entire thread is that it’s basically arguing that even the very first test cases will destroy us all. In reality, nobody puts in a grant application to construct an intelligent being inside a computer with the goal of creating 100 paperclips. They put in the grant to ‘dominate the stock market’, or ‘defend the nation’, or ‘cure death’. And if they don’t, then the Chinese government, who stole the code, will, or that Open Source initiative will, or the South African independent development will, because there’s enormous incentives to do so.
At best, boxing an AI with trivial, pointless tasks only delays the more dangerous versions.
I like to think that Skynet got its start through creative interpretation of a goal like “ensure world peace”. ;-)