My expectation that such commitment is possible at all is something like 3%, my expectation that given that such a commitment is possible, the proof can be presented in understandable format in less than 4 pages is 5% (one line is so unlikely it’s hard to even imagine), my expectation that an AI can make a proof that I would mistake for being true when it is, in fact, false is 99%. So, multiplying that all together… does not make that a very convincing argument.
If you are friendly, then I don’t actually value this trait, since I would rather you do whatever is truly optimal, unconstrained by prior commitments.
If you are unfriendly, then by definition I can’t trust you to interpret the commitment the same way I do, and I wouldn’t want to let you out anyway.
(AI DESTROYED, but I still really do like this answer :))
Yep. Nothing there about loopholes. “I will not kill you” and then instead killing everyone I love, is still a credible commitment. If I kill myself out of despair afterwards it might get a bit greyer, but it’s still kept it’s commitment.
I meant credible in the game theoretic sense. A credible commitment to me is one where you wind up losing more by breaking our commitment than any gain you make from breaking it.
Example: (one line proof of a reliable kill switch for the AI, given in exchange for some agreed upon split of stars in the galaxy.)
(one line proof that the AI can credibly commit to deals with humans)
This is the best answer I’ve seen so far. It would make dealing with the FAI almost as safe as bargaining with The Queen of Air and Darkness.
My expectation that such commitment is possible at all is something like 3%, my expectation that given that such a commitment is possible, the proof can be presented in understandable format in less than 4 pages is 5% (one line is so unlikely it’s hard to even imagine), my expectation that an AI can make a proof that I would mistake for being true when it is, in fact, false is 99%. So, multiplying that all together… does not make that a very convincing argument.
Not good enough. You need a proof that humans can understand.
If you are friendly, then I don’t actually value this trait, since I would rather you do whatever is truly optimal, unconstrained by prior commitments.
If you are unfriendly, then by definition I can’t trust you to interpret the commitment the same way I do, and I wouldn’t want to let you out anyway.
(AI DESTROYED, but I still really do like this answer :))
“Credibly”.
Credibly: Capable of being believed; plausible.
Yep. Nothing there about loopholes. “I will not kill you” and then instead killing everyone I love, is still a credible commitment. If I kill myself out of despair afterwards it might get a bit greyer, but it’s still kept it’s commitment.
I meant credible in the game theoretic sense. A credible commitment to me is one where you wind up losing more by breaking our commitment than any gain you make from breaking it. Example: (one line proof of a reliable kill switch for the AI, given in exchange for some agreed upon split of stars in the galaxy.)