“Sure, Rohin thought that was a major problem, but we [our organization/thought cluster/ideological group] never agreed with him.”
Oh really? Did you ever explicitly highlight this particular disagreement at the time?
FWIW at the time I wasn’t working on value learning and wasn’t incredibly excited about work in that direction, despite the fact that that’s what the rest of my lab was primarily focussed on. I also wrote a blog post in 2020, based off a conversation I had with Rohin in 2018, where I mention how important it is to work on inner alignment stuff and how those issues got brought up by the ‘paranoid wing’ of AI alignment. My guess is that my view was something like “stuff like reward learning from the state of the world doesn’t seem super important to me because of inner alignment etc, but for all I know cool stuff will blossom out of it, so I’m happy to hear about your progress and try to offer constructive feedback”, and that I expressed that to Rohin in person.
FWIW at the time I wasn’t working on value learning and wasn’t incredibly excited about work in that direction, despite the fact that that’s what the rest of my lab was primarily focussed on. I also wrote a blog post in 2020, based off a conversation I had with Rohin in 2018, where I mention how important it is to work on inner alignment stuff and how those issues got brought up by the ‘paranoid wing’ of AI alignment. My guess is that my view was something like “stuff like reward learning from the state of the world doesn’t seem super important to me because of inner alignment etc, but for all I know cool stuff will blossom out of it, so I’m happy to hear about your progress and try to offer constructive feedback”, and that I expressed that to Rohin in person.
Of course, the fact that I think the same thing now as I did in 2020 isn’t much evidence that I’m right.