I guess this proves the superiority of the mechanistic interpretability technique “note that it is mechanistically possible for your model to say that things are gorillas” :P
I guess this proves the superiority of the mechanistic interpretability technique “note that it is mechanistically possible for your model to say that things are gorillas” :P