jake_mendel

Karma: 1,243

technical AI safety program associate at OpenPhil

Research directions Open Phil wants to fund in technical AI safety

jake_mendel, maxnadeau and Peter Favaloro

Feb 8, 2025, 1:40 AM

117 points

21 comments58 min readLW link

(www.openphilanthropy.org)

Open Philanthropy Technical AI Safety RFP - $40M Available Across 21 Research Areas

jake_mendel, maxnadeau and Peter Favaloro

Feb 6, 2025, 6:58 PM

111 points

0 comments1 min readLW link

(www.openphilanthropy.org)

Attribution-based parameter decomposition

Lucius Bushnaq, Dan Braun, StefanHex, jake_mendel and Lee Sharkey

Jan 25, 2025, 1:12 PM

107 points

21 comments4 min readLW link

(publications.apolloresearch.ai)

Circuits in Superposition: Compressing many small neural networks into one

Lucius Bushnaq and jake_mendel

Oct 14, 2024, 1:06 PM

130 points

9 comments13 min readLW link

jake_mendel’s Shortform

jake_mendelSep 19, 2024, 10:37 AM

5 points

3 comments LW link

[Interim research report] Activation plateaus & sensitive directions in GPT2

StefanHex and jake_mendel

Jul 5, 2024, 5:05 PM

65 points

2 comments5 min readLW link

SAE feature geometry is outside the superposition hypothesis

jake_mendelJun 24, 2024, 4:07 PM

228 points

17 comments11 min readLW link

Apollo Research 1-year update

Marius Hobbhahn, Lee Sharkey, Lucius Bushnaq, Dan Braun, Mikita Balesni, Jérémy Scheurer, Nicholas Goldowsky-Dill, StefanHex, jake_mendel, AlexMeinke and rusheb

May 29, 2024, 5:44 PM

93 points

0 comments7 min readLW link

Interpretability: Integrated Gradients is a decent attribution method

Lucius Bushnaq, jake_mendel, StefanHex and Kaarel

May 20, 2024, 5:55 PM

23 points

7 comments6 min readLW link

The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks

Lucius Bushnaq, jake_mendel, Dan Braun, StefanHex, Nicholas Goldowsky-Dill, Kaarel, Avery, Joern Stoehler, debrevitatevitae, Magdalena Wache and Marius Hobbhahn

May 20, 2024, 5:53 PM

107 points

4 comments3 min readLW link

A starting point for making sense of task structure (in machine learning)

Kaarel, RP and jake_mendel

Feb 24, 2024, 1:51 AM

45 points

2 comments12 min readLW link

Toward A Mathematical Framework for Computation in Superposition

Dmitry Vaintrob, jake_mendel and Kaarel

Jan 18, 2024, 9:06 PM

204 points

18 comments63 min readLW link