If an eval is mandated by law, then it will be run even it required logprobs.
I won’t hold my breath.
I think commercial companies often would open up raw logprobs, but there’s not much demand, the logprobs are not really logprobs, and the problem is the leading model owners won’t do so, and those are the important ones to benchmark. I have little interest in the creativity of random little Llama finetunes no one uses.
I won’t hold my breath.
I think commercial companies often would open up raw logprobs, but there’s not much demand, the logprobs are not really logprobs, and the problem is the leading model owners won’t do so, and those are the important ones to benchmark. I have little interest in the creativity of random little Llama finetunes no one uses.
True, I should have said leading commercial companies