I think it would be valuable if someone would write a post that does (parts of) the following:
summarize the landscape of work on getting LLMs to reason.
sketch out the tree of possibilities for how o1 was trained and how it works in inference.
select a “most likely” path in that tree and describe in detail a possibility for how o1 works.
I would find it valuable since it seems important for external safety work to know how frontier models work, since otherwise it is impossible to point out theoretical or conceptual flaws for their alignment approaches.
One caveat: writing such a post could be considered an infohazard. I’m personally not too worried about this since I guess that every big lab is internally doing the same independently, so that the post would not speed up innovation at any of the labs.
I think it would be valuable if someone would write a post that does (parts of) the following:
summarize the landscape of work on getting LLMs to reason.
sketch out the tree of possibilities for how o1 was trained and how it works in inference.
select a “most likely” path in that tree and describe in detail a possibility for how o1 works.
I would find it valuable since it seems important for external safety work to know how frontier models work, since otherwise it is impossible to point out theoretical or conceptual flaws for their alignment approaches.
One caveat: writing such a post could be considered an infohazard. I’m personally not too worried about this since I guess that every big lab is internally doing the same independently, so that the post would not speed up innovation at any of the labs.