RSS

Evan Anders

Karma: 114

Postdoc at KITP studying AI safety /​ Mech Interp. Former astrophysical fluid dynamicist. Website: https://​​evanhanders.bitbucket.io/​​

Craft­ing Poly­se­man­tic Trans­former Bench­marks with Known Circuits

Aug 23, 2024, 10:03 PM
10 points
0 comments25 min readLW link

Sparse au­toen­coders find com­posed fea­tures in small toy mod­els

Mar 14, 2024, 6:00 PM
33 points
12 comments15 min readLW link

Ex­am­in­ing Lan­guage Model Perfor­mance with Re­con­structed Ac­ti­va­tions us­ing Sparse Au­toen­coders

Feb 27, 2024, 2:43 AM
43 points
16 comments15 min readLW link

How poly­se­man­tic can one neu­ron be? In­ves­ti­gat­ing fea­tures in TinyS­to­ries.

Evan AndersJan 16, 2024, 7:10 PM
14 points
0 comments8 min readLW link
(evanhanders.blog)

How does a toy 2 digit sub­trac­tion trans­former pre­dict the differ­ence?

Evan AndersDec 22, 2023, 9:17 PM
12 points
0 comments10 min readLW link
(evanhanders.blog)

How does a toy 2 digit sub­trac­tion trans­former pre­dict the sign of the out­put?

Evan AndersDec 19, 2023, 6:56 PM
14 points
0 comments8 min readLW link
(evanhanders.blog)