Rohin Shah comments on Multi-dimensional rewards for AGI interpretability and control

Rohin Shah 4 Jan 2021 17:37 UTC
LW: 8 AF: 7
AF
+1, was going to comment something similar.
You probably want to look at successor features in particular (for which you should run a full search + follow the citations, there are multiple papers); that’s exactly the thing where you only have a multidimensional value function but not multidimensional policy learning. Successor Features for Transfer in Reinforcement Learning (the paper John linked) is specifically addressing your motivation 2; I wouldn’t be surprised if some follow up paper (or even that paper) addresses motivation 1 as well.
Most other papers (including Universal Value Function Approximators) are trying to learn policies that can accomplish multiple different goals, so aren’t as relevant.
- Steven Byrnes 5 Jan 2021 1:28 UTC
  LW: 2 AF: 1
  AF Parent
  Awesome, thanks so much!!!