RSS

jake_mendel

Karma: 1,240

technical AI safety program associate at OpenPhil

Re­search di­rec­tions Open Phil wants to fund in tech­ni­cal AI safety

Feb 8, 2025, 1:40 AM
116 points
21 comments58 min readLW link
(www.openphilanthropy.org)

Open Philan­thropy Tech­ni­cal AI Safety RFP - $40M Available Across 21 Re­search Areas

Feb 6, 2025, 6:58 PM
111 points
0 comments1 min readLW link
(www.openphilanthropy.org)

At­tri­bu­tion-based pa­ram­e­ter decomposition

Jan 25, 2025, 1:12 PM
107 points
21 comments4 min readLW link
(publications.apolloresearch.ai)

Cir­cuits in Su­per­po­si­tion: Com­press­ing many small neu­ral net­works into one

Oct 14, 2024, 1:06 PM
130 points
9 comments13 min readLW link

jake_mendel’s Shortform

jake_mendelSep 19, 2024, 10:37 AM
5 points
3 commentsLW link

[In­terim re­search re­port] Ac­ti­va­tion plateaus & sen­si­tive di­rec­tions in GPT2

Jul 5, 2024, 5:05 PM
65 points
2 comments5 min readLW link

SAE fea­ture ge­om­e­try is out­side the su­per­po­si­tion hypothesis

jake_mendelJun 24, 2024, 4:07 PM
228 points
17 comments11 min readLW link

Apollo Re­search 1-year update

May 29, 2024, 5:44 PM
93 points
0 comments7 min readLW link

In­ter­pretabil­ity: In­te­grated Gra­di­ents is a de­cent at­tri­bu­tion method

May 20, 2024, 5:55 PM
23 points
7 comments6 min readLW link

The Lo­cal In­ter­ac­tion Ba­sis: Iden­ti­fy­ing Com­pu­ta­tion­ally-Rele­vant and Sparsely In­ter­act­ing Fea­tures in Neu­ral Networks

May 20, 2024, 5:53 PM
105 points
4 comments3 min readLW link

A start­ing point for mak­ing sense of task struc­ture (in ma­chine learn­ing)

Feb 24, 2024, 1:51 AM
45 points
2 comments12 min readLW link

Toward A Math­e­mat­i­cal Frame­work for Com­pu­ta­tion in Superposition

Jan 18, 2024, 9:06 PM
204 points
18 comments63 min readLW link