RSS

Jeremy Gillen

Karma: 1,619

I’m interested in doing in-depth dialogues to find cruxes. Message me if you are interested in doing this.

I do alignment research, mostly stuff that is vaguely agent foundations. Currently doing independent alignment research on ontology identification. Formerly on Vivek’s team at MIRI. Most of my writing before mid 2023 is not representative of my current views about alignment difficulty.

Con­text-de­pen­dent consequentialism

4 Nov 2024 9:29 UTC
31 points
6 comments27 min readLW link

Without fun­da­men­tal ad­vances, mis­al­ign­ment and catas­tro­phe are the de­fault out­comes of train­ing pow­er­ful AI

26 Jan 2024 7:22 UTC
161 points
60 comments57 min readLW link

Thomas Kwa’s MIRI re­search experience

2 Oct 2023 16:42 UTC
172 points
53 comments1 min readLW link

AISC team re­port: Soft-op­ti­miza­tion, Bayes and Goodhart

27 Jun 2023 6:05 UTC
37 points
2 comments15 min readLW link

Soft op­ti­miza­tion makes the value tar­get bigger

Jeremy Gillen2 Jan 2023 16:06 UTC
117 points
20 comments12 min readLW link

Jeremy Gillen’s Shortform

Jeremy Gillen19 Oct 2022 16:14 UTC
6 points
46 comments1 min readLW link

Neu­ral Tan­gent Ker­nel Distillation

5 Oct 2022 18:11 UTC
76 points
20 comments8 min readLW link

In­ner Align­ment via Superpowers

30 Aug 2022 20:01 UTC
37 points
13 comments4 min readLW link

Find­ing Goals in the World Model

22 Aug 2022 18:06 UTC
59 points
8 comments13 min readLW link

The Core of the Align­ment Prob­lem is...

17 Aug 2022 20:07 UTC
76 points
10 comments9 min readLW link

Pro­ject pro­posal: Test­ing the IBP defi­ni­tion of agent

9 Aug 2022 1:09 UTC
21 points
4 comments2 min readLW link

Broad Bas­ins and Data Compression

8 Aug 2022 20:33 UTC
33 points
6 comments7 min readLW link

Trans­lat­ing be­tween La­tent Spaces

30 Jul 2022 3:25 UTC
27 points
2 comments8 min readLW link

Ex­plain­ing in­ner al­ign­ment to myself

Jeremy Gillen24 May 2022 23:10 UTC
9 points
2 comments10 min readLW link

Good­hart’s Law Causal Diagrams

11 Apr 2022 13:52 UTC
34 points
5 comments6 min readLW link