(I assume you are asking “why do we assume the agent has a coherent utility function” rather than “why do we assume the agent tries maximizing their utility” ? )
Agents like humans which don’t have such a nice utility function:
Can notice that problem and try to repair themselves
Note that humans do in practice try to repair ourselves, like to smash down our own emotions in order to be more productive. But we don’t have access to our source code, so we’re not so good at it
I think that if the AI can’t repair that part of themselves and they’re still vulnerable to money pumping, then they’re not the AGI we’re afraid of, I think
Adding: My opinion comes from this Miri/Yudkowsky talk, I linked to the relevant place, he speaks about this in the next 10-15 minutes or so of the video
Yes you can. One mathy example is in the source I mentioned in my subcomment (sorry for not linking again, I’m on mobile). Another is gambling I guess? And probably other addictions too?
(I assume you are asking “why do we assume the agent has a coherent utility function” rather than “why do we assume the agent tries maximizing their utility” ? )
Agents like humans which don’t have such a nice utility function:
Are vulnerable to money pumping
Can notice that problem and try to repair themselves
Note that humans do in practice try to repair ourselves, like to smash down our own emotions in order to be more productive. But we don’t have access to our source code, so we’re not so good at it
I think that if the AI can’t repair that part of themselves and they’re still vulnerable to money pumping, then they’re not the AGI we’re afraid of, I think
Adding: My opinion comes from this Miri/Yudkowsky talk, I linked to the relevant place, he speaks about this in the next 10-15 minutes or so of the video
Yes you can. One mathy example is in the source I mentioned in my subcomment (sorry for not linking again, I’m on mobile). Another is gambling I guess? And probably other addictions too?