johnswentworth comments on My AI Model Delta Compared To Yudkowsky

johnswentworth 11 Jul 2024 2:23 UTC
4 points
0
A lot of the particulars of humans’ values are heavily reflective. Two examples:
- A large chunk of humans’ terminal values involves their emotional/experience states—happy, sad, in pain, delighted, etc.
- Humans typically want ~terminally to have some control over their own futures.
Contrast that to e.g. a blue-minimizing robot, which just tries to minimize the amount of blue stuff in the universe. That utility function involves reflection only insofar as the robot is (or isn’t) blue.