I’m finding it quite hard to get a sense at what the actual Loss Recovered numbers you report are, and to compare them concretely to other work. If possible, it’d be very helpful if you shared:
What the zero ablations CE scores are for each model and SAE position. (I assume it’s much worse for the MLP and attention outputs than the residual stream?)
Thanks for the feedback, we will put up an update to the paper with all these numbers in tables, tomorrow night. For now I have sent you them (and can send anyone else them who wants them in the next 24H)
This is neat, nice work!
I’m finding it quite hard to get a sense at what the actual Loss Recovered numbers you report are, and to compare them concretely to other work. If possible, it’d be very helpful if you shared:
What the zero ablations CE scores are for each model and SAE position. (I assume it’s much worse for the MLP and attention outputs than the residual stream?)
What the baseline CE scores are for each model.
Thanks for the feedback, we will put up an update to the paper with all these numbers in tables, tomorrow night. For now I have sent you them (and can send anyone else them who wants them in the next 24H)