Chris_Leong comments on OpenAI’s Alignment Plans

Chris_Leong 25 Aug 2022 2:16 UTC
13 points
0
What’s your plan for inner alignment?
- William_S 25 Aug 2022 16:50 UTC
  13 points
  2
  Parent
  Jan Leike has written about inner alignment here https://aligned.substack.com/p/inner-alignment. (I’m at OpenAI, imo I’m not sure if this will work in the worst case and I’m hoping we can come up with a more robust plan)
  - Chris_Leong 26 Aug 2022 3:58 UTC
    2 points
    0
    Parent
    Yeah, though he hasn’t specced out his plan.
- Daniel Kokotajlo 25 Aug 2022 16:35 UTC
  12 points
  4
  Parent
  I can’t speak for OpenAI, but maybe the hope is that we don’t need to solve inner alignment in step 1. In step 1 we figure out how to get our narrow-ish, not-yet-superintelligent systems to help us with alignment research even though they aren’t fully aligned and can’t be trusted to scale up to superintelligence or learn certain dangerous skills. Then in step 2 we solve inner alignment and all remaining alignment problems using the help of those systems.
  - Chris_Leong 26 Aug 2022 4:04 UTC
    4 points
    1
    Parent
    Interesting idea. I guess that could be worth a shot if we lack anything better.
- dkirmani 25 Aug 2022 2:27 UTC
  4 points
  0
  Parent
  I don’t work for OpenAI. I just saw Sam Altman tweet this post, so I linkposted it here.