Daniel Kokotajlo comments on OpenAI’s Alignment Plans

Daniel Kokotajlo 25 Aug 2022 16:35 UTC
12 points
4
I can’t speak for OpenAI, but maybe the hope is that we don’t need to solve inner alignment in step 1. In step 1 we figure out how to get our narrow-ish, not-yet-superintelligent systems to help us with alignment research even though they aren’t fully aligned and can’t be trusted to scale up to superintelligence or learn certain dangerous skills. Then in step 2 we solve inner alignment and all remaining alignment problems using the help of those systems.
- Chris_Leong 26 Aug 2022 4:04 UTC
  4 points
  1
  Parent
  Interesting idea. I guess that could be worth a shot if we lack anything better.