It can be both, of course. Start with process supervision but combine it with… something else. It’s hard to learn how to reason from scratch, but it’s also clearly not doing pure strict imitation learning, because the transcripts & summaries are just way too weird to be any kind of straightforward imitation learning of expert transcripts (or even ones collected from users or the wild).
It can be both, of course. Start with process supervision but combine it with… something else. It’s hard to learn how to reason from scratch, but it’s also clearly not doing pure strict imitation learning, because the transcripts & summaries are just way too weird to be any kind of straightforward imitation learning of expert transcripts (or even ones collected from users or the wild).
Wouldn’t that conflict with the quote? (Though maybe they’re not doing what they’ve implied in the quote)