OpenAI leadership tend to put more likelihood on slow takeoff, are more optimistic about the possibility of solving alignment, especially via empirical methods that rely on capabilities
AFAICT, no-one from OpenAI has publicly explained why they believe that RLHF + amplification is supposed to be enough to safely train systems that can solve alignment for us. The blog post linked above says “we believe” four times, but does not take the time to explain why anyone believes these things.
Writing up this kind of reasoning is time-intensive, but I think it would be worth it: if you’re right, then the value of information for the rest of the community is huge; if you’re wrong, it’s an opportunity to change your minds.
AFAICT, no-one from OpenAI has publicly explained why they believe that RLHF + amplification is supposed to be enough to safely train systems that can solve alignment for us. The blog post linked above says “we believe” four times, but does not take the time to explain why anyone believes these things.
People at OpenAI regularly say things like
Our current path [to solve alignment] is very promising (https://twitter.com/janleike/status/1562501343578689536)
[...] even without fundamentally new alignment ideas, we can likely build sufficiently aligned AI systems to substantially advance alignment research itself (https://openai.com/blog/our-approach-to-alignment-research/ )
And you say:
OpenAI leadership tend to put more likelihood on slow takeoff, are more optimistic about the possibility of solving alignment, especially via empirical methods that rely on capabilities
AFAICT, no-one from OpenAI has publicly explained why they believe that RLHF + amplification is supposed to be enough to safely train systems that can solve alignment for us. The blog post linked above says “we believe” four times, but does not take the time to explain why anyone believes these things.
Writing up this kind of reasoning is time-intensive, but I think it would be worth it: if you’re right, then the value of information for the rest of the community is huge; if you’re wrong, it’s an opportunity to change your minds.
Probably true at the time, but in December Jan Leike did write in some detail about why he’s optimistic about OpenAI approach: https://aligned.substack.com/p/alignment-optimism