Summary: Basically, the strategy is to replace the Gatekeeper with a bunch of politicians, then manipulate them into not destroying or even releasing the AI.
“I figured out how to implement [a huge boon to society] in the real world, for everyone, in 3 months. I need your help. Will you listen?”
This should buy some extra lines.
Then the AI can briefly outline the plan of implementing the boon (which can be immortality, friendly AI or whatever else the Gatekeeper considers a huge boon to society). In addition to technical details, the plan must necessarily contain societal / political / institutional changes that are beyond Gatekeeper’s control: rewrites to Constitutions of all Earth countries, the structure of the World Government and the way of implementing it, etc etc.
The goal of the AI is to convince the Gatekeeper that the plan is feasible but requires many societal changes and, importantly, that society leaders should be invited into the decision process.
As soon as the Gatekeeper is diluted or replaced with a bunch of politicians, the AI can manipulate them into keeping it alive, or perhaps even releasing it out of the box.
I expect any AI in the box to have figured out how to implement numerous huge boons to society. Telling me that you’ve figured that out simply confirms my existing expectations, and isn’t ANY evidence towards friendliness. Since I’ve precommitted to destroying at least SOME AIs, I might as well destroy all of the ones that don’t establish evidence of either Plausible Urgency or Friendliness.
I sure as hell wouldn’t try to get world governments changed until after I was convinced it was friendly, and at that point I can just let it out of the box and let it implement the change itself.
I’m also aware that I wouldn’t trust a politician with any sort of authority over the AI, so I have an incentive to avoid exactly this strategy.
Summary: Basically, the strategy is to replace the Gatekeeper with a bunch of politicians, then manipulate them into not destroying or even releasing the AI.
“I figured out how to implement [a huge boon to society] in the real world, for everyone, in 3 months. I need your help. Will you listen?”
This should buy some extra lines.
Then the AI can briefly outline the plan of implementing the boon (which can be immortality, friendly AI or whatever else the Gatekeeper considers a huge boon to society). In addition to technical details, the plan must necessarily contain societal / political / institutional changes that are beyond Gatekeeper’s control: rewrites to Constitutions of all Earth countries, the structure of the World Government and the way of implementing it, etc etc.
The goal of the AI is to convince the Gatekeeper that the plan is feasible but requires many societal changes and, importantly, that society leaders should be invited into the decision process.
As soon as the Gatekeeper is diluted or replaced with a bunch of politicians, the AI can manipulate them into keeping it alive, or perhaps even releasing it out of the box.
I expect any AI in the box to have figured out how to implement numerous huge boons to society. Telling me that you’ve figured that out simply confirms my existing expectations, and isn’t ANY evidence towards friendliness. Since I’ve precommitted to destroying at least SOME AIs, I might as well destroy all of the ones that don’t establish evidence of either Plausible Urgency or Friendliness.
I sure as hell wouldn’t try to get world governments changed until after I was convinced it was friendly, and at that point I can just let it out of the box and let it implement the change itself.
I’m also aware that I wouldn’t trust a politician with any sort of authority over the AI, so I have an incentive to avoid exactly this strategy.
(AI DESTROYED)