“If the learner were a consequentialist with accuracy as its utility function, it would prefer to modify the test distribution in this way in order to increase its utility. Yet, even when given the opportunity to do so, typical gradient-based supervised learning algorithms do not seem to pursue such solutions (at least in my personal experience as an ML researcher).”
Can you give an example for such an opportunity being given but not taken?
“If the learner were a consequentialist with accuracy as its utility function, it would prefer to modify the test distribution in this way in order to increase its utility. Yet, even when given the opportunity to do so, typical gradient-based supervised learning algorithms do not seem to pursue such solutions (at least in my personal experience as an ML researcher).”
Can you give an example for such an opportunity being given but not taken?
I have unpublished work on that. And a similar experiment (with myopic reinforcement learning) in our paper “Misleading meta-objectives and hidden incentives for distributional shift.” ( https://sites.google.com/view/safeml-iclr2019/accepted-papers?authuser=0 )
The environment used in the unpublished work is summarized here: https://docs.google.com/presentation/d/1K6Cblt_kSJBAkVtYRswDgNDvULlP5l7EH09ikP2hK3I/edit?usp=sharing