You probably want to look at successor features in particular (for which you should run a full search + follow the citations, there are multiple papers); that’s exactly the thing where you only have a multidimensional value function but not multidimensional policy learning. Successor Features for Transfer in Reinforcement Learning (the paper John linked) is specifically addressing your motivation 2; I wouldn’t be surprised if some follow up paper (or even that paper) addresses motivation 1 as well.
Most other papers (including Universal Value Function Approximators) are trying to learn policies that can accomplish multiple different goals, so aren’t as relevant.
+1, was going to comment something similar.
You probably want to look at successor features in particular (for which you should run a full search + follow the citations, there are multiple papers); that’s exactly the thing where you only have a multidimensional value function but not multidimensional policy learning. Successor Features for Transfer in Reinforcement Learning (the paper John linked) is specifically addressing your motivation 2; I wouldn’t be surprised if some follow up paper (or even that paper) addresses motivation 1 as well.
Most other papers (including Universal Value Function Approximators) are trying to learn policies that can accomplish multiple different goals, so aren’t as relevant.
Awesome, thanks so much!!!