Yep. Nothing there about loopholes. “I will not kill you” and then instead killing everyone I love, is still a credible commitment. If I kill myself out of despair afterwards it might get a bit greyer, but it’s still kept it’s commitment.
I meant credible in the game theoretic sense. A credible commitment to me is one where you wind up losing more by breaking our commitment than any gain you make from breaking it.
Example: (one line proof of a reliable kill switch for the AI, given in exchange for some agreed upon split of stars in the galaxy.)
“Credibly”.
Credibly: Capable of being believed; plausible.
Yep. Nothing there about loopholes. “I will not kill you” and then instead killing everyone I love, is still a credible commitment. If I kill myself out of despair afterwards it might get a bit greyer, but it’s still kept it’s commitment.
I meant credible in the game theoretic sense. A credible commitment to me is one where you wind up losing more by breaking our commitment than any gain you make from breaking it. Example: (one line proof of a reliable kill switch for the AI, given in exchange for some agreed upon split of stars in the galaxy.)