RSS

Adam Karvonen

Karma: 369

Adam Kar­vo­nen’s Shortform

Adam KarvonenJan 18, 2025, 5:11 PM
4 points
1 comment1 min readLW link

SAEBench: A Com­pre­hen­sive Bench­mark for Sparse Autoencoders

Dec 11, 2024, 6:30 AM
82 points
6 comments2 min readLW link
(www.neuronpedia.org)

Eval­u­at­ing Sparse Au­toen­coders with Board Game Models

Aug 2, 2024, 7:50 PM
38 points
1 comment9 min readLW link

Us­ing an LLM per­plex­ity filter to de­tect weight exfiltration

Adam KarvonenJul 21, 2024, 6:18 PM
25 points
11 comments2 min readLW link

Othel­loGPT learned a bag of heuristics

Jul 2, 2024, 9:12 AM
110 points
10 comments9 min readLW link

An In­tu­itive Ex­pla­na­tion of Sparse Au­toen­coders for Mechanis­tic In­ter­pretabil­ity of LLMs

Adam KarvonenJun 25, 2024, 3:57 PM
27 points
0 comments9 min readLW link
(adamkarvonen.github.io)