I think you made a off by 100 error in Unlabeled Evaluation with all win rates <1%
Thanks for pointing that out. Sometimes, the rows will not add up to 100 because there were some responses where the model refused to answer.
No. By off by 100 I meant of by a factor of 100 to small, NOT that they don’t sum up to 100.
Yeah, I see it. It’s fixed now. Thanks!
I think you made a off by 100 error in Unlabeled Evaluation with all win rates <1%
Thanks for pointing that out. Sometimes, the rows will not add up to 100 because there were some responses where the model refused to answer.
No. By off by 100 I meant of by a factor of 100 to small, NOT that they don’t sum up to 100.
Yeah, I see it. It’s fixed now. Thanks!