Jeremy Gillen

Karma: 1,939

I’m interested in doing in-depth dialogues to find cruxes. Message me if you are interested in doing this.

I do alignment research, mostly stuff that is vaguely agent foundations. Currently doing independent alignment research on ontology identification. Formerly on Vivek’s team at MIRI. Most of my writing before mid 2023 is not representative of my current views about alignment difficulty.

Detect Goodhart and shut down

Jeremy GillenJan 22, 2025, 6:45 PM

68 points

21 comments7 min readLW link

Context-dependent consequentialism

Jeremy Gillen and mattmacdermott

Nov 4, 2024, 9:29 AM

31 points

6 comments27 min readLW link

Without fundamental advances, misalignment and catastrophe are the default outcomes of training powerful AI

Jeremy Gillen and peterbarnett

Jan 26, 2024, 7:22 AM

161 points

60 comments57 min readLW link

Thomas Kwa’s MIRI research experience

Thomas Kwa, peterbarnett, Vivek Hebbar, Jeremy Gillen, Bird Concept and Raemon

Oct 2, 2023, 4:42 PM

172 points

53 comments1 min readLW link

AISC team report: Soft-optimization, Bayes and Goodhart

Simon Fischer, benjaminko, jazcarretao, DFNaiff and Jeremy Gillen

Jun 27, 2023, 6:05 AM

38 points

2 comments15 min readLW link

Soft optimization makes the value target bigger

Jeremy GillenJan 2, 2023, 4:06 PM

119 points

20 comments12 min readLW link

Jeremy Gillen’s Shortform

Jeremy GillenOct 19, 2022, 4:14 PM

6 points

46 comments LW link

Neural Tangent Kernel Distillation

Thomas Larsen and Jeremy Gillen

Oct 5, 2022, 6:11 PM

76 points

20 comments8 min readLW link

Inner Alignment via Superpowers

JamesH, Thomas Larsen and Jeremy Gillen

Aug 30, 2022, 8:01 PM

37 points

13 comments4 min readLW link

Finding Goals in the World Model

Jeremy Gillen, JamesH and Thomas Larsen

Aug 22, 2022, 6:06 PM

59 points

8 comments13 min readLW link

The Core of the Alignment Problem is...

Thomas Larsen, Jeremy Gillen and JamesH

Aug 17, 2022, 8:07 PM

76 points

10 comments9 min readLW link

Project proposal: Testing the IBP definition of agent

Jeremy Gillen, Thomas Larsen and JamesH

Aug 9, 2022, 1:09 AM

21 points

4 comments2 min readLW link

Broad Basins and Data Compression

Jeremy Gillen, Stephen Fowler and Thomas Larsen

Aug 8, 2022, 8:33 PM

33 points

6 comments7 min readLW link

Translating between Latent Spaces

JamesH, Jeremy Gillen and NickyP

Jul 30, 2022, 3:25 AM

27 points

2 comments8 min readLW link

Explaining inner alignment to myself

Jeremy GillenMay 24, 2022, 11:10 PM

9 points

2 comments10 min readLW link

Goodhart’s Law Causal Diagrams

JustinShovelain and Jeremy Gillen

Apr 11, 2022, 1:52 PM

34 points

6 comments6 min readLW link