(I) overrated, insofar as you get stuck on a hedonic treadmill,
This is actually a good thing, primarily because such a mechanism is almost certainly key to how we avoid wireheading. In particular, it avoids the problem of RL agents inevitably learning to hack the reward, by always bringing it down to a set point of happiness and avoiding runaway happiness leading to wireheading.
This is actually a good thing, primarily because such a mechanism is almost certainly key to how we avoid wireheading. In particular, it avoids the problem of RL agents inevitably learning to hack the reward, by always bringing it down to a set point of happiness and avoiding runaway happiness leading to wireheading.