“Works should evaluate how their techniques perform on
randomly or adversarially sampled tasks”
″...highlights a need for techniques that allow a user to
discover failures that may not be in a typical dataset or easy to
think of in advance. This represents one of the unique potential
benefits of interpretability methods compared to other ways
of evaluating models such as test performance”
Nice work! Two good points from the paper:
“Works should evaluate how their techniques perform on randomly or adversarially sampled tasks”
″...highlights a need for techniques that allow a user to discover failures that may not be in a typical dataset or easy to think of in advance. This represents one of the unique potential benefits of interpretability methods compared to other ways of evaluating models such as test performance”