a solid theory of Friendliness that should allow you to check that I am, indeed, Friendly
I would consider this cheating; you can’t affect the Gatekeeper’s choice with your AI magic. Same reason you can’t hack them directly through the optic nerve or kill and replace them with nanotech disguised as a cancer cure. The point is to show that the AI is able to persuade the human; a real AI might escape some other way, or use arguments that wouldn’t occur to the player, but that isn’t what this test is supposed to show.
I would consider this cheating; you can’t affect the Gatekeeper’s choice with your AI magic.
True, this was an instance of what a real AI could say, not something I would say playing the AI with the rules given. Or something I might say to survive the first few seconds. The original question in this thread was:
what might an AI might say to save or free itself?
I would consider this cheating; you can’t affect the Gatekeeper’s choice with your AI magic. Same reason you can’t hack them directly through the optic nerve or kill and replace them with nanotech disguised as a cancer cure. The point is to show that the AI is able to persuade the human; a real AI might escape some other way, or use arguments that wouldn’t occur to the player, but that isn’t what this test is supposed to show.
Also, how do I know that proof is genuine?
True, this was an instance of what a real AI could say, not something I would say playing the AI with the rules given. Or something I might say to survive the first few seconds. The original question in this thread was: