Jatin Nainani comments on SAEs are highly dataset dependent: a case study on the refusal direction

Jatin Nainani 10 Nov 2024 21:16 UTC
1 point
0
Makes sense! Thanks! In that case, we can potentially reduce the width, which might (along with a smaller dataset) help scale saes to understanding mechanisms in big models?