How do you define “appropriate”? It seems a little circular. Friendly AI is AI that optimises for appropriate values, and appropriate values are the ones for which we’d want a Friendly AI to optimise.
You might say that “appropriate” values are ones which “we” would like to see the future optimised towards, but I think whether these even exist humanity-wide is an open question (and I’m leaning towards “no”), in which case you should probably have a contingency definition for what to do if they, in fact, do not.
I would also be shocked if there were a “provable” definition of “appropriate” (as opposed to the friendliness of the program being provable with respect to some definition of “appropriate”).
How do you define “appropriate”? It seems a little circular. Friendly AI is AI that optimises for appropriate values, and appropriate values are the ones for which we’d want a Friendly AI to optimise.
You might say that “appropriate” values are ones which “we” would like to see the future optimised towards, but I think whether these even exist humanity-wide is an open question (and I’m leaning towards “no”), in which case you should probably have a contingency definition for what to do if they, in fact, do not.
I would also be shocked if there were a “provable” definition of “appropriate” (as opposed to the friendliness of the program being provable with respect to some definition of “appropriate”).