RSS

Jeremy Gillen

Karma: 1,939

I’m interested in doing in-depth dialogues to find cruxes. Message me if you are interested in doing this.

I do alignment research, mostly stuff that is vaguely agent foundations. Currently doing independent alignment research on ontology identification. Formerly on Vivek’s team at MIRI. Most of my writing before mid 2023 is not representative of my current views about alignment difficulty.

De­tect Good­hart and shut down

Jeremy GillenJan 22, 2025, 6:45 PM
68 points
21 comments7 min readLW link

Con­text-de­pen­dent consequentialism

Nov 4, 2024, 9:29 AM
31 points
6 comments27 min readLW link

Without fun­da­men­tal ad­vances, mis­al­ign­ment and catas­tro­phe are the de­fault out­comes of train­ing pow­er­ful AI

Jan 26, 2024, 7:22 AM
161 points
60 comments57 min readLW link

Thomas Kwa’s MIRI re­search experience

Oct 2, 2023, 4:42 PM
172 points
53 comments1 min readLW link

AISC team re­port: Soft-op­ti­miza­tion, Bayes and Goodhart

Jun 27, 2023, 6:05 AM
38 points
2 comments15 min readLW link

Soft op­ti­miza­tion makes the value tar­get bigger

Jeremy GillenJan 2, 2023, 4:06 PM
119 points
20 comments12 min readLW link

Jeremy Gillen’s Shortform

Jeremy GillenOct 19, 2022, 4:14 PM
6 points
46 commentsLW link

Neu­ral Tan­gent Ker­nel Distillation

Oct 5, 2022, 6:11 PM
76 points
20 comments8 min readLW link

In­ner Align­ment via Superpowers

Aug 30, 2022, 8:01 PM
37 points
13 comments4 min readLW link

Find­ing Goals in the World Model

Aug 22, 2022, 6:06 PM
59 points
8 comments13 min readLW link

The Core of the Align­ment Prob­lem is...

Aug 17, 2022, 8:07 PM
76 points
10 comments9 min readLW link

Pro­ject pro­posal: Test­ing the IBP defi­ni­tion of agent

Aug 9, 2022, 1:09 AM
21 points
4 comments2 min readLW link

Broad Bas­ins and Data Compression

Aug 8, 2022, 8:33 PM
33 points
6 comments7 min readLW link

Trans­lat­ing be­tween La­tent Spaces

Jul 30, 2022, 3:25 AM
27 points
2 comments8 min readLW link

Ex­plain­ing in­ner al­ign­ment to myself

Jeremy GillenMay 24, 2022, 11:10 PM
9 points
2 comments10 min readLW link

Good­hart’s Law Causal Diagrams

Apr 11, 2022, 1:52 PM
34 points
6 comments6 min readLW link