That makes sense, though I’d also expect that LfLH benchmarks like BASALT could turn out to be a better fit for superscale models in general.
Oh yeah, it totally is, and I’d be excited for that to happen. But I think that will be a single project, whereas the benchmark reporting process is meant to apply for things where there will be lots of projects that you want to compare in a reasonably apples-to-apples way, so when designing the reporting process I’m focused more on the small-scale projects that aren’t GPT-N-like.
It’s also possible this has already been done and I’m unaware of it
I’m pretty confident that there’s nothing like this that’s been done and publicly released.
Oh yeah, it totally is, and I’d be excited for that to happen. But I think that will be a single project, whereas the benchmark reporting process is meant to apply for things where there will be lots of projects that you want to compare in a reasonably apples-to-apples way, so when designing the reporting process I’m focused more on the small-scale projects that aren’t GPT-N-like.
I’m pretty confident that there’s nothing like this that’s been done and publicly released.