John Schulman comments on Multi-dimensional rewards for AGI interpretability and control

John Schulman 4 Jan 2021 4:38 UTC
LW: 17 AF: 11
AF
There’s a decent amount of literature on using multiple rewards, though often it’s framed as learning about multiple goals. Here are some off the top of my head:
The Horde (classic): http://www.ifaamas.org/Proceedings/aamas2011/papers/A6_R70.pdf
Universal Value Function Approximators: http://proceedings.mlr.press/v37/schaul15.html
Learning to Act By Predicting: https://arxiv.org/abs/1611.01779
Temporal Difference Models: https://arxiv.org/abs/1802.09081
Successor Features: https://papers.nips.cc/paper/2017/hash/350db081a661525235354dd3e19b8c05-Abstract.html

Also see the discussion in Appendix D about prediction heads in OpenAI Five, used mostly for interpretability/diagnostics https://cdn.openai.com/dota-2.pdf.
What links here?
- Model-based RL, Desires, Brains, Wireheading by Steven Byrnes (14 Jul 2021 15:11 UTC; 22 points)
- Rohin Shah 4 Jan 2021 17:37 UTC
  LW: 8 AF: 7
  AF Parent
  +1, was going to comment something similar.
  You probably want to look at successor features in particular (for which you should run a full search + follow the citations, there are multiple papers); that’s exactly the thing where you only have a multidimensional value function but not multidimensional policy learning. Successor Features for Transfer in Reinforcement Learning (the paper John linked) is specifically addressing your motivation 2; I wouldn’t be surprised if some follow up paper (or even that paper) addresses motivation 1 as well.
  Most other papers (including Universal Value Function Approximators) are trying to learn policies that can accomplish multiple different goals, so aren’t as relevant.
  - Steven Byrnes 5 Jan 2021 1:28 UTC
    LW: 2 AF: 1
    AF Parent
    Awesome, thanks so much!!!