We introduced the concept of the space of models in terms of optimization and motivated the utility of gradients as a distance measure in the space of model that corresponds to the required amount of adjustment to model parameters to properly represent given inputs.
Looks kinda similar, I guess. But their methods require you to know what the labels are, they require you to do backprop, they require you to know the loss function of your model, and it looks like their methods wouldn’t work on arbitrarily-specified submodules of a given model, only the model as a whole.
The approach in my post is dirt-cheap, straightforward, and it Just Works™. In my experiments (as you can see in the code) I draw my “output” from the third-last convolutional state. Why? Because it doesn’t matter—grab inscrutable vectors from the middle of the model, and it still works as you’d expect it to.
Sounds very closely related to gradient based OOD detection methods; see https://arxiv.org/abs/2008.08030
Looks kinda similar, I guess. But their methods require you to know what the labels are, they require you to do backprop, they require you to know the loss function of your model, and it looks like their methods wouldn’t work on arbitrarily-specified submodules of a given model, only the model as a whole.
The approach in my post is dirt-cheap, straightforward, and it Just Works™. In my experiments (as you can see in the code) I draw my “output” from the third-last convolutional state. Why? Because it doesn’t matter—grab inscrutable vectors from the middle of the model, and it still works as you’d expect it to.