Deep reinforcement learning agents will not come to intrinsically and primarily value their reward signal; reward is not the trained agent’s optimization target.
I have no stake in this debate, but how is this particular point any different than what Eliezer says when he makes the point about humans not optimizing for IGF? I think the entire mesaoptimization concern is built around this premise, no?
I have no stake in this debate, but how is this particular point any different than what Eliezer says when he makes the point about humans not optimizing for IGF? I think the entire mesaoptimization concern is built around this premise, no?