Kerrigan comments on (My understanding of) What Everyone in Technical Alignment is Doing and Why

Kerrigan Feb 20, 2023, 7:12 AM
3 points
0
Humans have different values than the reward circuitry in our brain being maximized, but they are still pointed reliably. These underlying values cause us to not wirehead with respect to the outer optimizer of reward
Is there an already written expansion of this?