Let’s start with the template for an AGI, the seed for a generally intelligent expected-utility maximizer capable of recursive self-improvement.
As far as I can tell, the implementation of such a template would do nothing at all because its utility-function would be a “blank slate”.
What happens if you now enclose the computation of Pi in its utility-function? Would it reflect on this goal and try to figure out its true goals? Why would it do so, where does the incentive come from?
Would complex but implicit goals change its behavior? Why would it improve upon its goals, why would it even try to preserve them in their current form if it has no explicit incentive to do so? It seems that if it indeed has an incentive to make its goals explicit, given an implicit utility-function, then the incentive to do so must be a presupposition inherent to the definition of a generally intelligent expected-utility maximizer capable of recursive self-improvement.
What happens if you now enclose the computation of Pi in its utility-function? Would it reflect on this goal and try to figure out its true goals? Why would it do so, where does the incentive come from?
So: the general story is that to be able to optimise, agents have to build a model of the world—in order to predict the consequences of their possible actions. That model of the world will necessarily include a model of the agent—since it is an important part of its own local environment. That model of itself is likely to include its own goals—and it will use Occam’s razor to build a neat model of them. Thus goal reflection—Q.E.D.
It seems that if it indeed has an incentive to make its goals explicit, given an implicit utility-function, then the incentive to do so must be a presupposition inherent to the definition of a generally intelligent expected-utility maximizer capable of recursive self-improvement.
That is the general idea of universal instrumental values, yes.
I am aware of that argument but don’t perceive it to be particularly convincing.
Universal values are very similar to universal ethics, and for the same reasons that I don’t think that an AGI will be friendly by default I don’t think that it will protect its goals or undergo recursive self-improvement by default. Maximizing expected utility is, just like friendliness, something that needs to be explicitly defined, otherwise there will be no incentive to do so.
Universal values are very similar to universal ethics, and for the same reasons that I don’t think that an AGI will be friendly by default I don’t think that it will protect its goals or undergo recursive self-improvement by default.
I’m not really sure what you mean “by default”. The idea is that a goal-directed machine that is sufficiently smart will tend to do these things (unless its utility function says otherwise) - at least if you can set it up so it doesn’t become a victim of the wirehead or pornography problems.
IMO, there’s a big difference between universal instrumental values and values to do with being nice to humans. The first type you get without asking—the second you have to deliberately build in. IMO, it doesn’t make much sense to lump these ideas together and reject both of them on the same grounds—as you seem to be doing.
Let’s start with the template for an AGI, the seed for a generally intelligent expected-utility maximizer capable of recursive self-improvement.
As far as I can tell, the implementation of such a template would do nothing at all because its utility-function would be a “blank slate”.
What happens if you now enclose the computation of Pi in its utility-function? Would it reflect on this goal and try to figure out its true goals? Why would it do so, where does the incentive come from?
Would complex but implicit goals change its behavior? Why would it improve upon its goals, why would it even try to preserve them in their current form if it has no explicit incentive to do so? It seems that if it indeed has an incentive to make its goals explicit, given an implicit utility-function, then the incentive to do so must be a presupposition inherent to the definition of a generally intelligent expected-utility maximizer capable of recursive self-improvement.
So: the general story is that to be able to optimise, agents have to build a model of the world—in order to predict the consequences of their possible actions. That model of the world will necessarily include a model of the agent—since it is an important part of its own local environment. That model of itself is likely to include its own goals—and it will use Occam’s razor to build a neat model of them. Thus goal reflection—Q.E.D.
That is the general idea of universal instrumental values, yes.
I am aware of that argument but don’t perceive it to be particularly convincing.
Universal values are very similar to universal ethics, and for the same reasons that I don’t think that an AGI will be friendly by default I don’t think that it will protect its goals or undergo recursive self-improvement by default. Maximizing expected utility is, just like friendliness, something that needs to be explicitly defined, otherwise there will be no incentive to do so.
I’m not really sure what you mean “by default”. The idea is that a goal-directed machine that is sufficiently smart will tend to do these things (unless its utility function says otherwise) - at least if you can set it up so it doesn’t become a victim of the wirehead or pornography problems.
IMO, there’s a big difference between universal instrumental values and values to do with being nice to humans. The first type you get without asking—the second you have to deliberately build in. IMO, it doesn’t make much sense to lump these ideas together and reject both of them on the same grounds—as you seem to be doing.