I think this part of the reversed argument is wrong:
The agent will randomly seek behaviours that get rewarded, but as long as these behaviours are reasonably rare (and are not that bad) then that’s not too costly
Even if the behaviors are very rare, and have a “normal” reward, then the agent will seek them out and so miss out on actually good states.
I think this part of the reversed argument is wrong:
Even if the behaviors are very rare, and have a “normal” reward, then the agent will seek them out and so miss out on actually good states.
But there are behaviors we always seek out. Trivially, eating, and sleeping.