Humans have different values than the reward circuitry in our brain being maximized, but they are still pointed reliably. These underlying values cause us to not wirehead with respect to the outer optimizer of reward
Is there an already written expansion of this?
Is there an already written expansion of this?