Interesting, I didn’t know that. But it seems like that assumes that o1′s special-sauce training can be viewed as a kind of RLHF, right? Do we know enough about that training to know that it’s RLHF-ish? Or at least some clearly offline approach.
Interesting, I didn’t know that. But it seems like that assumes that o1′s special-sauce training can be viewed as a kind of RLHF, right? Do we know enough about that training to know that it’s RLHF-ish? Or at least some clearly offline approach.