I’m not Eliezer, but my high-level attempt at this:
[...] The things I’d mainly recommend are interventions that:
Help ourselves think more clearly. (I imagine this including a lot of trying-to-become-more-rational, developing and following relatively open/honest communication norms, and trying to build better mental models of crucial parts of the world.)
Help relevant parts of humanity (e.g., the field of ML, or academic STEM) think more clearly and understand the situation.
Help us understand and resolve major disagreements. (Especially current disagreements, but also future disagreements, if we can e.g. improve our ability to double-crux in some fashion.)
Try to solve the alignment problem, especially via novel approaches.
In particular: the biggest obstacle to alignment seems to be ‘current ML approaches are super black-box-y and produce models that are very hard to understand/interpret’; finding ways to better understand models produced by current techniques, or finding alternative techniques that yield more interpretable models, seems like where most of the action is.
Think about the space of relatively-plausible “miracles” [i.e., positive model violations], think about future evidence that could make us quickly update toward a miracle-claim being true, and think about how we should act to take advantage of that miracle in that case.
Build teams and skills that are well-positioned to take advantage of miracles when and if they arise. E.g., build some group like Redwood into an org that’s world-class in its ability to run ML experiments, so we have that capacity already available if we find a way to make major alignment progress in the future.
This can also include indirect approaches, like ‘rather than try to solve the alignment problem myself, I’ll try to recruit physicists to work on it, because they might bring new and different perspectives to bear’.
Though I definitely think there’s a lot to be said for more people trying to solve the alignment problem themselves, even if they’re initially pessimistic they’ll succeed!
I think alignment is still the big blocker on good futures, and still the place where we’re most likely to see crucial positive surprises, if we see them anywhere—possibly Eliezer would disagree here.
I’m not Eliezer, but my high-level attempt at this: