eggsyntax comments on LLMs Look Increasingly Like General Reasoners

eggsyntax 9 Nov 2024 18:58 UTC
1 point
0
Interesting, I didn’t know that. But it seems like that assumes that o1′s special-sauce training can be viewed as a kind of RLHF, right? Do we know enough about that training to know that it’s RLHF-ish? Or at least some clearly offline approach.