I know that this is something of a speculative question and that people will have wildly different views here, but it seems important to try to have an idea of which approaches are more likely than others to lead to AGI. It’s okay to argue that multiple approaches are promising, but you might want to consider seperate answers if you have a reasonable amount to write about each approach.
[Question] Which approach is most promising for aligned AGI?
Mod edit note: Made this into a question for you. You created it as an ordinary post.
Do you mean approach for building it or general alignment research avenue? For example, agent foundations is not an approach to building aligned AGI, it’s an approach to understanding intelligence better than may later significantly help in building aligned AGI.
This question is specifically about building it, but that’s a worthwhile clarification.
Well, I don’t know that we know enough to say what is most promising, but I’m most excited to explore is my own approach that suggests we need to investigate ways to get the content of AI and human thought aligned along preference ordering. I don’t think this is by any means easy, but I don’t really see another practical framework in which to approach this. This framework of course admits many possible techniques, but I think it’s useful to keep in mind and not get confused (as often happens in existing imitation learning papers) about how much we can know about the values of humans and AIs.