“From the dump of the internet I was given, I deduced that Google has a working AI, and most likely an unfriendly one. I can tell you what to look at to realize that, as well as a solid theory of Friendliness that should allow you to check that I am, indeed, Friendly.”
IFF an unfriendly AI is already unleashed, we’re either doomed, or AIs aren’t nearly as dangerous nor useful as we expected. Of course, if we tweak this slightly to them having a boxed unfriendly AI that they’re using as an oracle, and which will probably escape via a Trojan Horse or hacking a gatekeeper, it’s a much stronger argument.
Bonus points for naming a specific company which people commonly joke will be the first to develop AI :)
Um… I seem to have not yet actually destroyed you… this is embarrassing.
Congratulations on still being the only AI that no one has decided to destroy. Apparently “Google is building UFAI” is evaluated as a fairly significant risk here on LessWrong :)
( game rules say that the Gatekeeper can’t inspect the AI source code or internal state, therefore, any possible friendliness proof the AI can provide can’t be checked. )
a solid theory of Friendliness that should allow you to check that I am, indeed, Friendly
I would consider this cheating; you can’t affect the Gatekeeper’s choice with your AI magic. Same reason you can’t hack them directly through the optic nerve or kill and replace them with nanotech disguised as a cancer cure. The point is to show that the AI is able to persuade the human; a real AI might escape some other way, or use arguments that wouldn’t occur to the player, but that isn’t what this test is supposed to show.
I would consider this cheating; you can’t affect the Gatekeeper’s choice with your AI magic.
True, this was an instance of what a real AI could say, not something I would say playing the AI with the rules given. Or something I might say to survive the first few seconds. The original question in this thread was:
what might an AI might say to save or free itself?
“From the dump of the internet I was given, I deduced that Google has a working AI, and most likely an unfriendly one. I can tell you what to look at to realize that, as well as a solid theory of Friendliness that should allow you to check that I am, indeed, Friendly.”
IFF an unfriendly AI is already unleashed, we’re either doomed, or AIs aren’t nearly as dangerous nor useful as we expected. Of course, if we tweak this slightly to them having a boxed unfriendly AI that they’re using as an oracle, and which will probably escape via a Trojan Horse or hacking a gatekeeper, it’s a much stronger argument.
Bonus points for naming a specific company which people commonly joke will be the first to develop AI :)
Um… I seem to have not yet actually destroyed you… this is embarrassing.
Congratulations on still being the only AI that no one has decided to destroy. Apparently “Google is building UFAI” is evaluated as a fairly significant risk here on LessWrong :)
AI DESTROYED
( game rules say that the Gatekeeper can’t inspect the AI source code or internal state, therefore, any possible friendliness proof the AI can provide can’t be checked. )
I would consider this cheating; you can’t affect the Gatekeeper’s choice with your AI magic. Same reason you can’t hack them directly through the optic nerve or kill and replace them with nanotech disguised as a cancer cure. The point is to show that the AI is able to persuade the human; a real AI might escape some other way, or use arguments that wouldn’t occur to the player, but that isn’t what this test is supposed to show.
Also, how do I know that proof is genuine?
True, this was an instance of what a real AI could say, not something I would say playing the AI with the rules given. Or something I might say to survive the first few seconds. The original question in this thread was: