I usually assume “provably friendly” means “will provably optimise for complex human-like values correctly” and thus includes both actual humanity-wide values and one person’s values (and the two options can plausibly be switched between at a late stage of the design process).
And, well, I meant for it to be a little funny, so I’ll take that as a win!
Friendly means something like “will optimize for the appropriate complex human-like values correctly.”
Saying “we don’t have clear criteria for appropriate human values” is just another way of saying that defining Friendly is hard.
Provably Friendly means we have a mathematical proof that an AI will be Friendly before we start running the AI.
An AI that gives its designer ultimate power over humanity is almost certainly not Friendly, even if it was Provably designer-godlike-powers implementing.
How do you define “appropriate”? It seems a little circular. Friendly AI is AI that optimises for appropriate values, and appropriate values are the ones for which we’d want a Friendly AI to optimise.
You might say that “appropriate” values are ones which “we” would like to see the future optimised towards, but I think whether these even exist humanity-wide is an open question (and I’m leaning towards “no”), in which case you should probably have a contingency definition for what to do if they, in fact, do not.
I would also be shocked if there were a “provable” definition of “appropriate” (as opposed to the friendliness of the program being provable with respect to some definition of “appropriate”).
I usually assume “provably friendly” means “will provably optimise for complex human-like values correctly” and thus includes both actual humanity-wide values and one person’s values (and the two options can plausibly be switched between at a late stage of the design process).
And, well, I meant for it to be a little funny, so I’ll take that as a win!
Friendly means something like “will optimize for the appropriate complex human-like values correctly.”
Saying “we don’t have clear criteria for appropriate human values” is just another way of saying that defining Friendly is hard.
Provably Friendly means we have a mathematical proof that an AI will be Friendly before we start running the AI.
An AI that gives its designer ultimate power over humanity is almost certainly not Friendly, even if it was Provably designer-godlike-powers implementing.
How do you define “appropriate”? It seems a little circular. Friendly AI is AI that optimises for appropriate values, and appropriate values are the ones for which we’d want a Friendly AI to optimise.
You might say that “appropriate” values are ones which “we” would like to see the future optimised towards, but I think whether these even exist humanity-wide is an open question (and I’m leaning towards “no”), in which case you should probably have a contingency definition for what to do if they, in fact, do not.
I would also be shocked if there were a “provable” definition of “appropriate” (as opposed to the friendliness of the program being provable with respect to some definition of “appropriate”).