Good point. May I ask, is “explicit utility function” standard terminology, and if yes, is there a good reference to it somewhere that explains it? It took me a long time until I realized the interesting difference between humans, who engage in moral philosophy and often can’t tell you what their goals are, and my model of paperclippers. I also think that not understanding this difference is a big reason why people don’t understand the orthagonality thesis.
They’re often called explicit goals not utility functions. Utility function is a terminology from a very specific moral philosophy.
Also note that the orthogonality thesis depends on an explicit goal structure. Without such an architecture it should be called the orthogonality hypothesis.
Not necessarily. You are assuming that she has an explicit utility function, but that need not be the case.
Good point. May I ask, is “explicit utility function” standard terminology, and if yes, is there a good reference to it somewhere that explains it? It took me a long time until I realized the interesting difference between humans, who engage in moral philosophy and often can’t tell you what their goals are, and my model of paperclippers. I also think that not understanding this difference is a big reason why people don’t understand the orthagonality thesis.
No, I do not believe that it is standard terminology, though you can find a decent reference here.
They’re often called explicit goals not utility functions. Utility function is a terminology from a very specific moral philosophy.
Also note that the orthogonality thesis depends on an explicit goal structure. Without such an architecture it should be called the orthogonality hypothesis.