I don’t see how that addresses the problem. You’re linking to a philosophical answer, and this is an engineering problem.
The claim you made, some posts ago, was “we can set an AI’s goals by reference to a human’s utility function.” Many folks objected that humans don’t really have utility functions. My objection was “we have no idea how to extract a utility function, even given complete data about a human’s brain.” Defining “utility function” isn’t a solution. If you want to use “the utility function of a particular human” in building an AI, you need not only a definition, but a construction. To be convincing in this conversation, you would need to at least give some evidence that such a construction is possible.
You are trying to use, as a subcomponent, something we have no idea how to build and that seems possibly as hard as the original problem. And this isn’t a good way to do engineering.
The way I expect AGI to work is receiving a mathematical definition of its utility function as input. So there is no need to have a “construction”. I don’t even know what a “construction” is, in this context.
Note that in my formal definition of intelligence, we can use any appropriate formula* in the given formal language as a utility function, since it all comes down to computing logical expectation values. In fact I expect a real seed AGI to work through computing logical expectation values (by an approximate method, probably some kind of Monte Carlo).
Of course, if the AGI design we will come up with is only defined for a certain category of utility functions then we need to somehow project into this category (assuming the category is rich enough for the projection not to lose too much information). The construction of this projection operator indeed might be very difficult.
In practice, I formulated the definition with utility = Solomonoff expectation value of something computable. But this restriction isn’t necessary. Note that my proposal for defining logical probabilities admits self reference in the sense that the reasoning system is allowed to speak of the probabilities it assigns (like in Christiano et al).
I don’t see how that addresses the problem. You’re linking to a philosophical answer, and this is an engineering problem.
The claim you made, some posts ago, was “we can set an AI’s goals by reference to a human’s utility function.” Many folks objected that humans don’t really have utility functions. My objection was “we have no idea how to extract a utility function, even given complete data about a human’s brain.” Defining “utility function” isn’t a solution. If you want to use “the utility function of a particular human” in building an AI, you need not only a definition, but a construction. To be convincing in this conversation, you would need to at least give some evidence that such a construction is possible.
You are trying to use, as a subcomponent, something we have no idea how to build and that seems possibly as hard as the original problem. And this isn’t a good way to do engineering.
The way I expect AGI to work is receiving a mathematical definition of its utility function as input. So there is no need to have a “construction”. I don’t even know what a “construction” is, in this context.
Note that in my formal definition of intelligence, we can use any appropriate formula* in the given formal language as a utility function, since it all comes down to computing logical expectation values. In fact I expect a real seed AGI to work through computing logical expectation values (by an approximate method, probably some kind of Monte Carlo).
Of course, if the AGI design we will come up with is only defined for a certain category of utility functions then we need to somehow project into this category (assuming the category is rich enough for the projection not to lose too much information). The construction of this projection operator indeed might be very difficult.
In practice, I formulated the definition with utility = Solomonoff expectation value of something computable. But this restriction isn’t necessary. Note that my proposal for defining logical probabilities admits self reference in the sense that the reasoning system is allowed to speak of the probabilities it assigns (like in Christiano et al).