Well, I don’t know that we know enough to say what is most promising, but I’m most excited to explore is my own approach that suggests we need to investigate ways to get the content of AI and human thought aligned along preference ordering. I don’t think this is by any means easy, but I don’t really see another practical framework in which to approach this. This framework of course admits many possible techniques, but I think it’s useful to keep in mind and not get confused (as often happens in existing imitation learning papers) about how much we can know about the values of humans and AIs.
Well, I don’t know that we know enough to say what is most promising, but I’m most excited to explore is my own approach that suggests we need to investigate ways to get the content of AI and human thought aligned along preference ordering. I don’t think this is by any means easy, but I don’t really see another practical framework in which to approach this. This framework of course admits many possible techniques, but I think it’s useful to keep in mind and not get confused (as often happens in existing imitation learning papers) about how much we can know about the values of humans and AIs.