Yonatan Cale comments on AGI Safety FAQ / all-dumb-questions-allowed thread

Yonatan Cale 9 Jun 2022 21:46 UTC
6 points
1
(I assume you are asking “why do we assume the agent has a coherent utility function” rather than “why do we assume the agent tries maximizing their utility” ? )
Agents like humans which don’t have such a nice utility function:
1. Are vulnerable to money pumping
2. Can notice that problem and try to repair themselves
3. Note that humans do in practice try to repair ourselves, like to smash down our own emotions in order to be more productive. But we don’t have access to our source code, so we’re not so good at it
I think that if the AI can’t repair that part of themselves and they’re still vulnerable to money pumping, then they’re not the AGI we’re afraid of, I think
- Yonatan Cale 10 Jun 2022 8:02 UTC
  1 point
  0
  Parent
  Adding: My opinion comes from this Miri/Yudkowsky talk, I linked to the relevant place, he speaks about this in the next 10-15 minutes or so of the video
- [ ]
  [deleted]
  - Yonatan Cale 12 Jun 2022 0:30 UTC
    2 points
    0
    Parent
    Yes you can. One mathy example is in the source I mentioned in my subcomment (sorry for not linking again, I’m on mobile). Another is gambling I guess? And probably other addictions too?
    - [ ]
      [deleted]