No, that is not a misinterpretation: I do think that this research agenda has the potential to get pretty close to solving outer alignment. More specifically, if it is (practically) possible to solve outer alignment through some form of reward learning, then I think this research agenda will establish how that can be done (and prove that this method works), and if it isn’t possible, then I think this research agenda will produce a precise understanding of why that isn’t possible (which would in turn help to inform subsequent research). I don’t think this research agenda is the only way to solve outer alignment, but I think it is the most promising way to do it.
I was taking it as “solves” or “gets pretty close to solving”. Maybe that’s a misinterpretation on my part. What did you mean here?
No, that is not a misinterpretation: I do think that this research agenda has the potential to get pretty close to solving outer alignment. More specifically, if it is (practically) possible to solve outer alignment through some form of reward learning, then I think this research agenda will establish how that can be done (and prove that this method works), and if it isn’t possible, then I think this research agenda will produce a precise understanding of why that isn’t possible (which would in turn help to inform subsequent research). I don’t think this research agenda is the only way to solve outer alignment, but I think it is the most promising way to do it.