RSS

lewis smith

Karma: 719

Nega­tive Re­sults for SAEs On Down­stream Tasks and Depri­ori­tis­ing SAE Re­search (GDM Mech In­terp Team Progress Up­date #2)

Mar 26, 2025, 7:07 PM
109 points
15 comments29 min readLW link
(deepmindsafetyresearch.medium.com)

A Prob­lem to Solve Be­fore Build­ing a De­cep­tion Detector

Feb 7, 2025, 7:35 PM
65 points
9 comments14 min readLW link

lewis smith’s Shortform

lewis smithAug 30, 2024, 9:51 AM
12 points
7 commentsLW link

The ‘strong’ fea­ture hy­poth­e­sis could be wrong

lewis smithAug 2, 2024, 2:33 PM
231 points
19 comments17 min readLW link

Im­prov­ing Dic­tionary Learn­ing with Gated Sparse Autoencoders

Apr 25, 2024, 6:43 PM
63 points
38 comments1 min readLW link
(arxiv.org)

[Full Post] Progress Up­date #1 from the GDM Mech In­terp Team

Apr 19, 2024, 7:06 PM
79 points
10 comments8 min readLW link

[Sum­mary] Progress Up­date #1 from the GDM Mech In­terp Team

Apr 19, 2024, 7:06 PM
72 points
0 comments3 min readLW link

Dropout can cre­ate a priv­ileged ba­sis in the ReLU out­put model.

lewis smithApr 28, 2023, 1:59 AM
24 points
3 comments5 min readLW link