Nice idea! A variation on this would be to first run a model as usual, saving top logits for each output token.Then give this output to another “inspector” model, that has to answer: “whether the output has any obvious errors, if this errors can be attributed to sampling issues, and whether correct output can be constructed out of the base model’s logits”.
This would be useful for better understanding limitations of a specific model—is it really limited by sampling methods. And would be useful for sampling methods research—finding cases where sampling fails, to devise better algorithms.
Nice idea! A variation on this would be to first run a model as usual, saving top logits for each output token.Then give this output to another “inspector” model, that has to answer: “whether the output has any obvious errors, if this errors can be attributed to sampling issues, and whether correct output can be constructed out of the base model’s logits”.
This would be useful for better understanding limitations of a specific model—is it really limited by sampling methods. And would be useful for sampling methods research—finding cases where sampling fails, to devise better algorithms.