Jesse Hoogland comments on o1: A Technical Primer

Jesse Hoogland 11 Dec 2024 17:01 UTC
LW: 2 AF: 1
0
AF
It’s worth noting that there are also hybrid approaches, for example, where you use automated verifiers (or a combination of automated verifiers and supervised labels) to train a process reward model that you then train your reasoning model against.