RSS

Lee Sharkey

Karma: 1,712

Apollo Research (London).

My main research interests are mechanistic interpretability and inner alignment.

Gated At­ten­tion Blocks: Pre­limi­nary Progress to­ward Re­mov­ing At­ten­tion Head Superposition

Apr 8, 2024, 11:14 AM
42 points
4 comments15 min readLW link

Spar­sify: A mechanis­tic in­ter­pretabil­ity re­search agenda

Lee SharkeyApr 3, 2024, 12:34 PM
96 points
23 comments22 min readLW link

Ad­dress­ing Fea­ture Sup­pres­sion in SAEs

Feb 16, 2024, 6:32 PM
86 points
4 comments10 min readLW link

The­o­ries of Change for AI Auditing

Nov 13, 2023, 7:33 PM
54 points
0 comments18 min readLW link
(www.apolloresearch.ai)

An­nounc­ing Apollo Research

May 30, 2023, 4:17 PM
217 points
11 comments8 min readLW link