Minimizing Empowerment for Safety
I haven’t put much thought into this post; it’s off the cuff.
DeepMind has published a couple of papers on maximizing empowerment as a form of intrinsic motivation for Unsupervised RL / Intelligent Exploration.
I never looked at either paper in detail, but the basic idea is that you should seek to maximize mutual information between (future) outcomes and actions or policies/options. Doing so means an agent knows what strategy to follow to accomplish a given outcome.
It seems plausible that instead minimizing empowerment in the case where there is a reward function could help steer an agent away from pursuing instrumental goals which have large effects.
So that might be useful for “taskification”, “limited impact”, etc.
Discussed briefly in Concrete Problems, FYI: https://arxiv.org/pdf/1606.06565.pdf
I would expect minimizing empowerment to impede the agent in achieving its objectives. You do want the agent to have large effects on some parts of the environment that are relevant to its objectives, without being incentivized to negate those effects in weird ways in order to achieve low impact overall.
I think we need something like a sparse empowerment constraint, where you minimize empowerment over most (but not all) dimensions of the future outcomes.