Reinforcement learning with rewards or punishments that can have an infinite magnitude would seem to make intuitive sense for me. The buck is then kicked to reasoning whether it’s ever reasonable to give a sample a post-finite reward. Say that there are pictures label as either “woman”, “girl”,”boy” or “man” and labeling a boy a man or a man a boy would get you a Small reward while labeling a man a man would get you a Large reward where Large is infinite respect with respect to Small. With a finite version some “boy” vs “girl” weight could overcome a “man” vs “girl” weight which might be undesirable behaviour (if you strictly care about gender discrimination with no tradeoff for age discrimination).
Reinforcement learning with rewards or punishments that can have an infinite magnitude would seem to make intuitive sense for me. The buck is then kicked to reasoning whether it’s ever reasonable to give a sample a post-finite reward. Say that there are pictures label as either “woman”, “girl”,”boy” or “man” and labeling a boy a man or a man a boy would get you a Small reward while labeling a man a man would get you a Large reward where Large is infinite respect with respect to Small. With a finite version some “boy” vs “girl” weight could overcome a “man” vs “girl” weight which might be undesirable behaviour (if you strictly care about gender discrimination with no tradeoff for age discrimination).