If you start thinking that way, then why do any experiments at all ?
It could have results that allow it to become a more effective paperclip maximizer.
Firstly, an objective morality—assuming such a thing exists, that is—would probably have something to say about paperclips, in the same way that gravity and electromagnetism have things to say about paperclips.
I’m not sure how that would work, but if it did, the paperclip maximizer would just use its knowledge of morality to create paperclips. It’s not as if action x being moral automatically means that it produces more paperclips. And even if it did, that would just mean that a paperclip minimizer would start acting immoral.
I am getting the feeling that you’re assuming there’s something in the agent’s code that says, “you can look at and change any line of code you want, except lines 12345..99999, because that’s where your terminal goals are”. Is that right ?
It’s perfectly capable of changing its terminal goals. It just generally doesn’t, because this wouldn’t help accomplish them. It doesn’t self-modify out of some desire to better itself. It self-modifies because that’s the action that produces the most paperclips. If it considers changing itself to value staples instead, it would realize that this action would actually cause a decrease in the amount of paperclips, and reject it.
It could have results that allow it to become a more effective paperclip maximizer.
I’m not sure how that would work, but if it did, the paperclip maximizer would just use its knowledge of morality to create paperclips. It’s not as if action x being moral automatically means that it produces more paperclips. And even if it did, that would just mean that a paperclip minimizer would start acting immoral.
It’s perfectly capable of changing its terminal goals. It just generally doesn’t, because this wouldn’t help accomplish them. It doesn’t self-modify out of some desire to better itself. It self-modifies because that’s the action that produces the most paperclips. If it considers changing itself to value staples instead, it would realize that this action would actually cause a decrease in the amount of paperclips, and reject it.