I agree with your general comments, and I’d like to add some additional observations of my own.
Reading the paper Reward is Enough, what strikes me most is that the paper is reductionist almost to the point of being a self-parody.
Take a sentence like:
The reward-is-enough hypothesis postulates that intelligence, and its associated abilities, can be understood as subserving the maximisation of reward by an agent acting in its environment.
I could rewrite this to
The physics-is-enough hypothesis postulates that intelligence, and its associated abilities, can be understood as being the laws of physics acting in an environment.
If I do that rewriting throughout the paper, I do not have to change any of the supporting arguments put forward by the authors: they equally support the physics-is-enough reductionist hypothesis.
The authors of ‘reward is enough’ posit that rewards explain everything, so you might think that they would be very interested in spending more time to look closely at the internal structure of actual reward signals that exist in the wild, or actual reward signals that might be designed. However, they are deeply uninterested in this. In fact they explicitly invite others to join them in solving the ‘challenge of sample-efficient reinforcement learning’ without ever doing such things.
Like you I feel that, when it comes to AI safety, this lack of interest in the details of reward signals is not very helpful. I like the multi-objective approach (see my comments here), but my own recent work like this has been more about abandoning the scalar reward hypothesis/paradigm even further, about building useful models of aligned intelligence which do not depend purely on the idea of reward maximisation. In that recent paper (mostly in section 7) I also develop some thoughts about why most ML researchers seem so interested in the problem of designing reward signals.
I agree with your general comments, and I’d like to add some additional observations of my own.
Reading the paper Reward is Enough, what strikes me most is that the paper is reductionist almost to the point of being a self-parody.
Take a sentence like:
I could rewrite this to
If I do that rewriting throughout the paper, I do not have to change any of the supporting arguments put forward by the authors: they equally support the physics-is-enough reductionist hypothesis.
The authors of ‘reward is enough’ posit that rewards explain everything, so you might think that they would be very interested in spending more time to look closely at the internal structure of actual reward signals that exist in the wild, or actual reward signals that might be designed. However, they are deeply uninterested in this. In fact they explicitly invite others to join them in solving the ‘challenge of sample-efficient reinforcement learning’ without ever doing such things.
Like you I feel that, when it comes to AI safety, this lack of interest in the details of reward signals is not very helpful. I like the multi-objective approach (see my comments here), but my own recent work like this has been more about abandoning the scalar reward hypothesis/paradigm even further, about building useful models of aligned intelligence which do not depend purely on the idea of reward maximisation. In that recent paper (mostly in section 7) I also develop some thoughts about why most ML researchers seem so interested in the problem of designing reward signals.