Jan Leike has written about inner alignment here https://aligned.substack.com/p/inner-alignment. (I’m at OpenAI, imo I’m not sure if this will work in the worst case and I’m hoping we can come up with a more robust plan)
Yeah, though he hasn’t specced out his plan.
Jan Leike has written about inner alignment here https://aligned.substack.com/p/inner-alignment. (I’m at OpenAI, imo I’m not sure if this will work in the worst case and I’m hoping we can come up with a more robust plan)
Yeah, though he hasn’t specced out his plan.