How do I know I’m not simulated by the AI to determine my reactions to different escape attempts? How much computing power does it have? Do I have access to its internals?
The situation seems somewhat underspecified to give a definite answer, but given the stakes I’d err on the side of terminating the AI with extreme prejudice. Bonus points if I can figure out a safe way to retain information on its goals so I can make sure the future contains as little utility for it as feasible.
The utility-minimizing part may be an overreaction but it does give me an idea: Maybe we should also cooperate with an unfriendly AI to such an extent that it’s better for it to negotiate instead of escaping and taking over the universe.
How do I know I’m not simulated by the AI to determine my reactions to different escape attempts? How much computing power does it have? Do I have access to its internals?
The situation seems somewhat underspecified to give a definite answer, but given the stakes I’d err on the side of terminating the AI with extreme prejudice. Bonus points if I can figure out a safe way to retain information on its goals so I can make sure the future contains as little utility for it as feasible.
The utility-minimizing part may be an overreaction but it does give me an idea: Maybe we should also cooperate with an unfriendly AI to such an extent that it’s better for it to negotiate instead of escaping and taking over the universe.