Can you expect that the applications to interpretability would apply on inputs radically outside of distribution?
My naive intuition is that by taking derivatives are you only describing local behaviour.
(I am “shooting from the hip” epistemically)
Can you expect that the applications to interpretability would apply on inputs radically outside of distribution?
My naive intuition is that by taking derivatives are you only describing local behaviour.
(I am “shooting from the hip” epistemically)