I also haven’t seen this mentioned anywhere.
I think most commercial frontier models that offer logprobs will take some precautions against distilling. Some logprobs seem to have a noise vector attached too (deepseek?), and some like grok will only offer the top 8, not the top 20. Others will not offer them at all.
It’s a shame, as logprobs can be really information rich and token efficient ways to do evals, ranking, and judging.
Deleted tweet. Why were they sceptical? And does anyone know if there were follow-up antibody tests, I can’t find them.