RSS

Jannik Brinkmann

Karma: 137

Eval­u­at­ing Sparse Au­toen­coders with Board Game Models

Aug 2, 2024, 7:50 PM
38 points
1 comment9 min readLW link

In­ter­pret­ing Prefer­ence Models w/​ Sparse Autoencoders

Jul 1, 2024, 9:35 PM
74 points
12 comments9 min readLW link

Find­ing Back­ward Chain­ing Cir­cuits in Trans­form­ers Trained on Tree Search

May 28, 2024, 5:29 AM
50 points
1 comment9 min readLW link
(arxiv.org)

Im­prov­ing SAE’s by Sqrt()-ing L1 & Re­mov­ing Low­est Ac­ti­vat­ing Fea­tures

Mar 15, 2024, 4:30 PM
26 points
5 comments4 min readLW link