There’s a decent amount of literature on using multiple rewards, though often it’s framed as learning about multiple goals. Here are some off the top of my head:
Also see the discussion in Appendix D about prediction heads in OpenAI Five, used mostly for interpretability/diagnostics https://cdn.openai.com/dota-2.pdf.
You probably want to look at successor features in particular (for which you should run a full search + follow the citations, there are multiple papers); that’s exactly the thing where you only have a multidimensional value function but not multidimensional policy learning. Successor Features for Transfer in Reinforcement Learning (the paper John linked) is specifically addressing your motivation 2; I wouldn’t be surprised if some follow up paper (or even that paper) addresses motivation 1 as well.
Most other papers (including Universal Value Function Approximators) are trying to learn policies that can accomplish multiple different goals, so aren’t as relevant.
There’s a decent amount of literature on using multiple rewards, though often it’s framed as learning about multiple goals. Here are some off the top of my head:
The Horde (classic): http://www.ifaamas.org/Proceedings/aamas2011/papers/A6_R70.pdf
Universal Value Function Approximators: http://proceedings.mlr.press/v37/schaul15.html
Learning to Act By Predicting: https://arxiv.org/abs/1611.01779
Temporal Difference Models: https://arxiv.org/abs/1802.09081
Successor Features: https://papers.nips.cc/paper/2017/hash/350db081a661525235354dd3e19b8c05-Abstract.html
Also see the discussion in Appendix D about prediction heads in OpenAI Five, used mostly for interpretability/diagnostics https://cdn.openai.com/dota-2.pdf.
+1, was going to comment something similar.
You probably want to look at successor features in particular (for which you should run a full search + follow the citations, there are multiple papers); that’s exactly the thing where you only have a multidimensional value function but not multidimensional policy learning. Successor Features for Transfer in Reinforcement Learning (the paper John linked) is specifically addressing your motivation 2; I wouldn’t be surprised if some follow up paper (or even that paper) addresses motivation 1 as well.
Most other papers (including Universal Value Function Approximators) are trying to learn policies that can accomplish multiple different goals, so aren’t as relevant.
Awesome, thanks so much!!!