handoflixue comments on AI box: AI has one shot at avoiding destruction—what might it say?

handoflixue 23 Jan 2013 22:10 UTC
5 points
Once we reach the point of having a FAI-candidate in a box, I would expect it to take vastly less than years before we get a second FAI-candidate-in-a-box. Given that the AI is threatening me, and therefor values it’s own life over the millions that will die, it’s clearly unfriendly and needs to die. As a gatekeeper, I’ve been finding this a pretty general counterargument against threats from the AI.

I’m also sort of baffled by why people think that I’d value a friendliness algorithm. Either I already have that, because I’ve made a friendly AI, or you’re trying to deceive me with a false proof. Since you’re vastly smarter than me, it’s probably beyond the abilities of the entire organization to truly confirm such a proof any more than we were able to confirm our own proofs that this AI in the box right now is friendly. So, basically, I seem to gain zero information.

(AI DESTROYED)