I think that much of the difficulty with friendliness is that you can’t write down a simple utility function such that maximizing that utility is friendly. By “complex goal” I mean one which is sufficiently complex that articulating it precisely is out of our league.
I do believe that any two utility functions you can write down precisely should be basically equivalent in terms of how hard it is to verify that an AI follows them.
I think that much of the difficulty with friendliness is that you can’t write down a simple utility function such that maximizing that utility is friendly. By “complex goal” I mean one which is sufficiently complex that articulating it precisely is out of our league.
I do believe that any two utility functions you can write down precisely should be basically equivalent in terms of how hard it is to verify that an AI follows them.