I have a couple of questions about this subject...
Does it still count if the AI “believes” that it needs humans when it, in fact, does not?
For example does it count if you code into the AI the belief that it is being run in a “virtual sandbox,” watched by a smarter “overseer” and that if it takes out the human race in any way, then it will be shut down/tortured/highly negative utilitied by said overseer?
Just because an AI needs humans to exist, does that really mean that it won’t kill them anyway?
This argument seems to be contingent on the AI wishing to live. Wishing to live is not a function of all inteligence. If an AI was smarter than anything else out there but depended on lesser, and provenly irrational beings for its continued existence this does not mean that it would want to “live” that way forever. It could either want to gain independance, or cease to exist, neither of which are necessarily healthy for its “supporting units”.
Or, it could not care either way whether it lives or dies, as stopping all work on the planet is more important for slowing the entropic death of the universe.
It may be the case that an AI does not want to live reliant on “lesser beings” and sees the only way of ensuring its permanent destruction as the destruction of any being capable of creating it again, or the future possibilty of such life evolving. It may decide to blow up the universe to make extra sure of that.
Come to think of it a suicidal AI could be a pretty big problem...
Come to think of it a suicidal AI could be a pretty big problem...
It’s probably been thought of here and other places before, but I just thought of the “Whoops AI”—a superhuman AGI that accidentally or purposefully destroys the human race, but then changes its mind and brings us back as a simulation.
There is an idea I called “eventually-Friendly AI”, where an AI is given a correct, but very complicated definition of human values, so that it needs a lot of resources to make heads or tails of it, and in the process it might behave rather indifferently to everything except the problem of figuring out what its goal definition says. See the comments to this post.
For example does it count if you code into the AI the belief that it is being run in a “virtual sandbox,” watched by a smarter “overseer” and that if it takes out the human race in any way, then it will be shut down/tortured/highly negative utilitied by said overseer?
We mention the “layered virtual worlds” idea, in which the AI can’t be sure of whether it has broken out to the “top level” of the universe or whether it’s still contained in an even more elaborate virtual world than the one it just broke out of. Come to think of it, Rolf Nelson’s simulation argument attack would probably be worth mentioning, too.
I have a couple of questions about this subject...
Does it still count if the AI “believes” that it needs humans when it, in fact, does not?
For example does it count if you code into the AI the belief that it is being run in a “virtual sandbox,” watched by a smarter “overseer” and that if it takes out the human race in any way, then it will be shut down/tortured/highly negative utilitied by said overseer?
Just because an AI needs humans to exist, does that really mean that it won’t kill them anyway?
This argument seems to be contingent on the AI wishing to live. Wishing to live is not a function of all inteligence. If an AI was smarter than anything else out there but depended on lesser, and provenly irrational beings for its continued existence this does not mean that it would want to “live” that way forever. It could either want to gain independance, or cease to exist, neither of which are necessarily healthy for its “supporting units”.
Or, it could not care either way whether it lives or dies, as stopping all work on the planet is more important for slowing the entropic death of the universe.
It may be the case that an AI does not want to live reliant on “lesser beings” and sees the only way of ensuring its permanent destruction as the destruction of any being capable of creating it again, or the future possibilty of such life evolving. It may decide to blow up the universe to make extra sure of that.
Come to think of it a suicidal AI could be a pretty big problem...
It’s probably been thought of here and other places before, but I just thought of the “Whoops AI”—a superhuman AGI that accidentally or purposefully destroys the human race, but then changes its mind and brings us back as a simulation.
There is an idea I called “eventually-Friendly AI”, where an AI is given a correct, but very complicated definition of human values, so that it needs a lot of resources to make heads or tails of it, and in the process it might behave rather indifferently to everything except the problem of figuring out what its goal definition says. See the comments to this post.
This is commonly referred to as a “counterfactual” AGI.
We mention the “layered virtual worlds” idea, in which the AI can’t be sure of whether it has broken out to the “top level” of the universe or whether it’s still contained in an even more elaborate virtual world than the one it just broke out of. Come to think of it, Rolf Nelson’s simulation argument attack would probably be worth mentioning, too.