Julian, unsympathetic aliens might well develop an instinct to keep their promises. I happen to think that even paperclip maximizers might one-box on Newcomb’s Problem (and by extension, cooperate on the true one-shot Prisoner’s Dilemma with a partner who they believe can predict their decision). They just wouldn’t like each other, or have any kind of “honor” that depends on imagining yourself in the other’s shoes.
Latanius, a Friendly AI the way I’ve described it is a CEV-optimizer, not something that feels sympathetic to humans. Human sympathy is one way of being friendly; it’s not the only way or even the most reliable way. For FAI-grade problems it would have to be exactly the right kind of sympathy at exactly the right kind of meta-level for exactly the right kind of environmental processes that, as it so happens, work extremely differently from the AI. If the optimizer you’re creating is not a future citizen but a nonsentient means to an end, you just write a utility function and be done with it.
Mike Blume, the hypothesis would be “human sociopaths have empathy but not sympathy”.
Julian, unsympathetic aliens might well develop an instinct to keep their promises. I happen to think that even paperclip maximizers might one-box on Newcomb’s Problem (and by extension, cooperate on the true one-shot Prisoner’s Dilemma with a partner who they believe can predict their decision). They just wouldn’t like each other, or have any kind of “honor” that depends on imagining yourself in the other’s shoes.
Latanius, a Friendly AI the way I’ve described it is a CEV-optimizer, not something that feels sympathetic to humans. Human sympathy is one way of being friendly; it’s not the only way or even the most reliable way. For FAI-grade problems it would have to be exactly the right kind of sympathy at exactly the right kind of meta-level for exactly the right kind of environmental processes that, as it so happens, work extremely differently from the AI. If the optimizer you’re creating is not a future citizen but a nonsentient means to an end, you just write a utility function and be done with it.
Mike Blume, the hypothesis would be “human sociopaths have empathy but not sympathy”.