Several quick thoughts about reinforcement learning:
Did anybody try to invent “decaying”/”bored” reward that decrease if the agent perform the same action over and over? It looks like real addiction mechanism in mammals and can be the clever trick that solve the reward hacking problem. Additional thought: how about multiplicative reward? Let’s suppose that we have several easy to evaluate from sensory data reward functions which somehow correlate with real utility function—does it make reward hacking more difficult?
Several quick thoughts about reinforcement learning:
Did anybody try to invent “decaying”/”bored” reward that decrease if the agent perform the same action over and over? It looks like real addiction mechanism in mammals and can be the clever trick that solve the reward hacking problem.
Additional thought: how about multiplicative reward? Let’s suppose that we have several easy to evaluate from sensory data reward functions which somehow correlate with real utility function—does it make reward hacking more difficult?