I can’t speak for OpenAI, but maybe the hope is that we don’t need to solve inner alignment in step 1. In step 1 we figure out how to get our narrow-ish, not-yet-superintelligent systems to help us with alignment research even though they aren’t fully aligned and can’t be trusted to scale up to superintelligence or learn certain dangerous skills. Then in step 2 we solve inner alignment and all remaining alignment problems using the help of those systems.
I can’t speak for OpenAI, but maybe the hope is that we don’t need to solve inner alignment in step 1. In step 1 we figure out how to get our narrow-ish, not-yet-superintelligent systems to help us with alignment research even though they aren’t fully aligned and can’t be trusted to scale up to superintelligence or learn certain dangerous skills. Then in step 2 we solve inner alignment and all remaining alignment problems using the help of those systems.
Interesting idea. I guess that could be worth a shot if we lack anything better.