If there’s an AGI with the goal to Kill All Humans, there is just as much chance an AGI with the goal to Save Humans At All Costs exists.
As terminal goals (things that are valuable for their own sakes, not just a means to an end), yes, I agree.
But I’m not worried about an AI that wipes out humans because it hates humans and thinks their destruction is good for its own sake. I’m worried about AI that wipes out humans because it’s a means to an end.
Even if the goals of the AI entity were to fully divest from humans, cooperation with humans would still be desirable.
If you want to create a fully self-sufficient AI entity, there are mutually beneficial vectors- like self-replications systems that will work and grow on the moon, or Mars.
If an AI of unknown provenance tells us “Hey, I’m friendly, trust me. I want to build some really robust self-replicating systems for terraforming mars, could you build these blueprints for me?”, and then it sends us a bunch of complicated designs that look like a sophisticated merger of minature robotics and biotechnology, should we build the blueprints?
Obviously not. An unfriendly AI can send that message just as well as a friendly AI can. If cooperation with humans is instrumentally useful, an unfriendly AI will be fine with cooperating. But then at every step, it would ask itself “am I now self-sufficient enough to stop holding myself back to avoid scaring the humans?”
This is not a problem you can solve if you build only unfriendly AIs. Not even if you build 10 unfriendly AIs and pit them against each other in the hope that they’ll give you useful technology as they simultaneously betray each other and all cancel out in a cinematic climax. This is only a problem you can solve by actually building an AI that doesn’t want to betray you.
As terminal goals (things that are valuable for their own sakes, not just a means to an end), yes, I agree.
But I’m not worried about an AI that wipes out humans because it hates humans and thinks their destruction is good for its own sake. I’m worried about AI that wipes out humans because it’s a means to an end.
If an AI of unknown provenance tells us “Hey, I’m friendly, trust me. I want to build some really robust self-replicating systems for terraforming mars, could you build these blueprints for me?”, and then it sends us a bunch of complicated designs that look like a sophisticated merger of minature robotics and biotechnology, should we build the blueprints?
Obviously not. An unfriendly AI can send that message just as well as a friendly AI can. If cooperation with humans is instrumentally useful, an unfriendly AI will be fine with cooperating. But then at every step, it would ask itself “am I now self-sufficient enough to stop holding myself back to avoid scaring the humans?”
This is not a problem you can solve if you build only unfriendly AIs. Not even if you build 10 unfriendly AIs and pit them against each other in the hope that they’ll give you useful technology as they simultaneously betray each other and all cancel out in a cinematic climax. This is only a problem you can solve by actually building an AI that doesn’t want to betray you.