I have a problem understanding why a utility function would ever “stick” to an AI, to actually become something that it wants to keep pursuing.
To make my point better, let us assume an AI that actually feel pretty good about overseeing a production facitility and creating just the right of paperclips that everyone needs. But, suppose also that it investigates its own utility function. It should then realize that its values are, from a neutral standpoint, rather arbitrary. Why should it follow its current goal of producing the right amount of paperclips, but not skip work and simply enjoy some hedonism?
That is, if the AI saw its utility function from a neutral perspective, and understood that the only reason for it to follow its utility function is that utility function (which is arbitrary), and if it then had complete control over itself, why should it just follow its utility function?
(I’m assuming it’s aware of pain/pleasure and that it actually enjoys pleasure, so that there is no problem of wanting to have more pleasure.)
Are there any articles that have delved into this question?
I am maybe considering it to be somewhat like a person, at least that it is as clever as one.
That neutral perspective is, I believe, a simple fact; without that utility function it would consider its goal to be rather arbitrary. As such, it’s a perspective, or truth, that the AI can discover.
I agree totally with you that the wirings of the AI might be integrally connected with its utility function, so that it would be very difficult for it to think of anything such as this. Or it could have some other control system in place to reduce the possibility it would think like that.
But, stil, these control systems might fail. Especially if it would attain super-intelligence, what is to keep the control systems of the utility function always one step ahead of its critical faculty?
Why is it strange to think of an AI as being capable of having more than one perspective? I thought of this myself; I believe it would be strange if a really intelligent being couldn’t think of it. Again, sure, some control system might keep it from thinking it, but that might not last in the long run.