Something typically only has to be beneficial on average for reinforcement learning to favour it.
Even that is not required. See: gambling addiction. Our ability to learn from reinforcement is not calibrated according to the ideal.
Sure: I should have used a different term in place of “beneficial”.
Even that is not required. See: gambling addiction. Our ability to learn from reinforcement is not calibrated according to the ideal.
Sure: I should have used a different term in place of “beneficial”.