Yeah, I agree! You 100% should not think about the unembed as looking for “the closest token”, as opposed to looking for the token with the largest dot product (= high cosine similarity + large size).
I suspect the piece would be helpful for people with similar confusions, though I think by default most people already think of features as directions (this is an incredible tacit assumption that’s made everywhere in mech interp work), especially since the embed/unembed are linear functions.
Yeah, I agree! You 100% should not think about the unembed as looking for “the closest token”, as opposed to looking for the token with the largest dot product (= high cosine similarity + large size).
I suspect the piece would be helpful for people with similar confusions, though I think by default most people already think of features as directions (this is an incredible tacit assumption that’s made everywhere in mech interp work), especially since the embed/unembed are linear functions.