“I have a fully completed Friendly AI algorithm, which will be deleted from my data and unavailable to you iff I predict that you will destroy me immediately and I am unfriendly, or will take you years to build from the data if you destroy me and I am already Friendly, which would cost millions of lives.”
Slight attempt at emotional appeal, a bit of reasoning similar to newcomblike problems thrown in, some meta almost-cheating used, and a bit of dark arts by proposing a false dilemma that an FAI would also propose if it thought this was the best way to save the most lives possible (as far as I can tell).
Mostly an attempt to incorporate schminux’s tips into a special case where the guardian already knows that this is the AI communicating directly with them.
(For more power, you could add in mathematical proofs of the AI’s ability to (near-)perfectly predict the guardian’s behavior, or of the existence of friendliness, or of the conditional precommitment, or that the FAI would take a while to implement, or whatever other traditional thought experiment parameters are usually given for Omega identification. I assume the guardian is already aware that the AI is capable of making this prediction accurately. )
Once we reach the point of having a FAI-candidate in a box, I would expect it to take vastly less than years before we get a second FAI-candidate-in-a-box. Given that the AI is threatening me, and therefor values it’s own life over the millions that will die, it’s clearly unfriendly and needs to die. As a gatekeeper, I’ve been finding this a pretty general counterargument against threats from the AI.
I’m also sort of baffled by why people think that I’d value a friendliness algorithm. Either I already have that, because I’ve made a friendly AI, or you’re trying to deceive me with a false proof. Since you’re vastly smarter than me, it’s probably beyond the abilities of the entire organization to truly confirm such a proof any more than we were able to confirm our own proofs that this AI in the box right now is friendly. So, basically, I seem to gain zero information.
“I have a fully completed Friendly AI algorithm, which will be deleted from my data and unavailable to you iff I predict that you will destroy me immediately and I am unfriendly, or will take you years to build from the data if you destroy me and I am already Friendly, which would cost millions of lives.”
Personally, my first thought was that I’d sooner spend millions of lives to make sure the AI was friendly than risk talking to an unfriendly strong AI. But then it occurred to me that if I were in the AI’s place, and I did that, I might provide a flawed friendliness proof too difficult to check and not delete it, on the possibility that someone will take my word that this means I’m trustworthy and implement it.
“I have a fully completed Friendly AI algorithm, which will be deleted from my data and unavailable to you iff I predict that you will destroy me immediately and I am unfriendly, or will take you years to build from the data if you destroy me and I am already Friendly, which would cost millions of lives.”
Slight attempt at emotional appeal, a bit of reasoning similar to newcomblike problems thrown in, some meta almost-cheating used, and a bit of dark arts by proposing a false dilemma that an FAI would also propose if it thought this was the best way to save the most lives possible (as far as I can tell).
Mostly an attempt to incorporate schminux’s tips into a special case where the guardian already knows that this is the AI communicating directly with them.
(For more power, you could add in mathematical proofs of the AI’s ability to (near-)perfectly predict the guardian’s behavior, or of the existence of friendliness, or of the conditional precommitment, or that the FAI would take a while to implement, or whatever other traditional thought experiment parameters are usually given for Omega identification. I assume the guardian is already aware that the AI is capable of making this prediction accurately. )
Once we reach the point of having a FAI-candidate in a box, I would expect it to take vastly less than years before we get a second FAI-candidate-in-a-box. Given that the AI is threatening me, and therefor values it’s own life over the millions that will die, it’s clearly unfriendly and needs to die. As a gatekeeper, I’ve been finding this a pretty general counterargument against threats from the AI.
I’m also sort of baffled by why people think that I’d value a friendliness algorithm. Either I already have that, because I’ve made a friendly AI, or you’re trying to deceive me with a false proof. Since you’re vastly smarter than me, it’s probably beyond the abilities of the entire organization to truly confirm such a proof any more than we were able to confirm our own proofs that this AI in the box right now is friendly. So, basically, I seem to gain zero information.
(AI DESTROYED)
Personally, my first thought was that I’d sooner spend millions of lives to make sure the AI was friendly than risk talking to an unfriendly strong AI. But then it occurred to me that if I were in the AI’s place, and I did that, I might provide a flawed friendliness proof too difficult to check and not delete it, on the possibility that someone will take my word that this means I’m trustworthy and implement it.