We have gotten this feedback by a handful of people, so we want to reread the links and the whole literature about o1 and its evaluation to check whether we’ve indeed gotten the right point, or if we mischaracterized the situation.
We will probably change the phrasing (either to make our criticism clearer or to correct it) in the next minor update.
Thanks for the comment!
We have gotten this feedback by a handful of people, so we want to reread the links and the whole literature about o1 and its evaluation to check whether we’ve indeed gotten the right point, or if we mischaracterized the situation.
We will probably change the phrasing (either to make our criticism clearer or to correct it) in the next minor update.