[Linkpost] Interpretable Analysis of Features Found in Open-source Sparse Autoencoder (partial replication)

Fernando Avalos9 Sep 2024 3:33 UTC

6 points

Sparse Autoencoders (SAEs)AI Research Agendas

This was an up-skilling project I worked on throughout the past months. Even though I don’t think it is anything fancy or highly relevant to the research around SAEs, I find it valuable since I learned a lot and refined my understanding of how mechinterp fits in the holistic, bigger picture of AI Safety.

In the mid-term future I hope to engage in more challenging and impactful projects.

P.D.: brutally honest feedback is completely welcome :p

Fernando Avalos9 Sep 2024 3:33 UTC

6 points

1 comment1 min readLW link

Sparse Autoencoders (SAEs)AI Research Agendas