Friendly means something like “will optimize for the appropriate complex human-like values correctly.”
Saying “we don’t have clear criteria for appropriate human values” is just another way of saying that defining Friendly is hard.
Provably Friendly means we have a mathematical proof that an AI will be Friendly before we start running the AI.
An AI that gives its designer ultimate power over humanity is almost certainly not Friendly, even if it was Provably designer-godlike-powers implementing.
How do you define “appropriate”? It seems a little circular. Friendly AI is AI that optimises for appropriate values, and appropriate values are the ones for which we’d want a Friendly AI to optimise.
You might say that “appropriate” values are ones which “we” would like to see the future optimised towards, but I think whether these even exist humanity-wide is an open question (and I’m leaning towards “no”), in which case you should probably have a contingency definition for what to do if they, in fact, do not.
I would also be shocked if there were a “provable” definition of “appropriate” (as opposed to the friendliness of the program being provable with respect to some definition of “appropriate”).
Friendly means something like “will optimize for the appropriate complex human-like values correctly.”
Saying “we don’t have clear criteria for appropriate human values” is just another way of saying that defining Friendly is hard.
Provably Friendly means we have a mathematical proof that an AI will be Friendly before we start running the AI.
An AI that gives its designer ultimate power over humanity is almost certainly not Friendly, even if it was Provably designer-godlike-powers implementing.
How do you define “appropriate”? It seems a little circular. Friendly AI is AI that optimises for appropriate values, and appropriate values are the ones for which we’d want a Friendly AI to optimise.
You might say that “appropriate” values are ones which “we” would like to see the future optimised towards, but I think whether these even exist humanity-wide is an open question (and I’m leaning towards “no”), in which case you should probably have a contingency definition for what to do if they, in fact, do not.
I would also be shocked if there were a “provable” definition of “appropriate” (as opposed to the friendliness of the program being provable with respect to some definition of “appropriate”).