It doesn’t make sense to me either, but it does seem to invalidate the “bootstrapping” results for the other 3 models. Maybe it’s because they could batch all reward model requests into one instance.
When MS doesn’t have enough compute to do their evals, the rest of us may struggle!
It doesn’t make sense to me either, but it does seem to invalidate the “bootstrapping” results for the other 3 models. Maybe it’s because they could batch all reward model requests into one instance.
When MS doesn’t have enough compute to do their evals, the rest of us may struggle!