If we’re already sacrificing max utility to create a myopic agent that’s lower risk, why would we not also want it to maximize temporal average rather than ensemble average to reduce wipeout risk?
If we’re already sacrificing max utility to create a myopic agent that’s lower risk, why would we not also want it to maximize temporal average rather than ensemble average to reduce wipeout risk?