It seems like there must be some decent ways to see how different two classifiers are, but I can only think of unprincipled things.
Two ideas:
Sample a lot of items and use both models to generate two rankings of the items (or log odds or some other score). Models that give similar scores to lots of examples are probably pretty similar. One problem with this is that optimizing for it when the problem is too easy will train your model to solve the problem a random way and then invert the ordering within the classes. (A similar solution with a similar problem is judging model similarity by how similarly they respond to deleting parts of the image.)
Maybe you could split the models into two parts, which we might hope were a “feature extractor” part and a “simple classifier” part. (Potentially a reconstruction loss could be added at the split to try to encourage the features to stay feature-y, but maybe it’s not too important.) Then you measure how different two models are by training a third classifier that’s given access to the features from both models, and seeing by how much it outperforms the originals.
It seems like there must be some decent ways to see how different two classifiers are, but I can only think of unprincipled things.
Two ideas:
Sample a lot of items and use both models to generate two rankings of the items (or log odds or some other score). Models that give similar scores to lots of examples are probably pretty similar. One problem with this is that optimizing for it when the problem is too easy will train your model to solve the problem a random way and then invert the ordering within the classes. (A similar solution with a similar problem is judging model similarity by how similarly they respond to deleting parts of the image.)
Maybe you could split the models into two parts, which we might hope were a “feature extractor” part and a “simple classifier” part. (Potentially a reconstruction loss could be added at the split to try to encourage the features to stay feature-y, but maybe it’s not too important.) Then you measure how different two models are by training a third classifier that’s given access to the features from both models, and seeing by how much it outperforms the originals.