Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Erik Jenner
Karma:
1,815
PhD student in AI safety at CHAI (UC Berkeley)
All
Posts
Comments
New
Top
Old
Page
1
Evidence of Learned Look-Ahead in a Chess-Playing Neural Network
Erik Jenner
4 Jun 2024 15:50 UTC
120
points
14
comments
13
min read
LW
link
Concrete empirical research projects in mechanistic anomaly detection
Erik Jenner
,
Viktor Rehnberg
and
Oliver Daniels
3 Apr 2024 23:07 UTC
43
points
3
comments
10
min read
LW
link
A gentle introduction to mechanistic anomaly detection
Erik Jenner
3 Apr 2024 23:06 UTC
71
points
2
comments
11
min read
LW
link
CHAI internship applications are open (due Nov 13)
Erik Jenner
26 Oct 2023 0:53 UTC
34
points
0
comments
3
min read
LW
link
A comparison of causal scrubbing, causal abstractions, and related methods
Erik Jenner
,
Adrià Garriga-alonso
and
Egor Zverev
8 Jun 2023 23:40 UTC
73
points
3
comments
22
min read
LW
link
[Appendix] Natural Abstractions: Key Claims, Theorems, and Critiques
LawrenceC
,
Erik Jenner
and
Leon Lang
16 Mar 2023 16:38 UTC
48
points
0
comments
13
min read
LW
link
Natural Abstractions: Key claims, Theorems, and Critiques
LawrenceC
,
Leon Lang
and
Erik Jenner
16 Mar 2023 16:37 UTC
237
points
23
comments
45
min read
LW
link
3
reviews
Sydney can play chess and kind of keep track of the board state
Erik Jenner
3 Mar 2023 9:39 UTC
64
points
19
comments
6
min read
LW
link
Research agenda: Formalizing abstractions of computations
Erik Jenner
2 Feb 2023 4:29 UTC
92
points
10
comments
31
min read
LW
link
Abstractions as morphisms between (co)algebras
Erik Jenner
14 Jan 2023 1:51 UTC
17
points
1
comment
8
min read
LW
link
Subsets and quotients in interpretability
Erik Jenner
2 Dec 2022 23:13 UTC
26
points
1
comment
7
min read
LW
link
ARC paper: Formalizing the presumption of independence
Erik Jenner
20 Nov 2022 1:22 UTC
97
points
2
comments
2
min read
LW
link
(arxiv.org)
Response to Katja Grace’s AI x-risk counterarguments
Erik Jenner
and
Johannes Treutlein
19 Oct 2022 1:17 UTC
77
points
18
comments
15
min read
LW
link
Disentangling inner alignment failures
Erik Jenner
10 Oct 2022 18:50 UTC
23
points
5
comments
4
min read
LW
link
Good ontologies induce commutative diagrams
Erik Jenner
9 Oct 2022 0:06 UTC
49
points
5
comments
14
min read
LW
link
How are you dealing with ontology identification?
Erik Jenner
4 Oct 2022 23:28 UTC
34
points
10
comments
3
min read
LW
link
Breaking down the training/deployment dichotomy
Erik Jenner
28 Aug 2022 21:45 UTC
30
points
3
comments
3
min read
LW
link
Reward model hacking as a challenge for reward learning
Erik Jenner
12 Apr 2022 9:39 UTC
25
points
1
comment
9
min read
LW
link
The (not so) paradoxical asymmetry between position and momentum
Erik Jenner
28 Mar 2021 13:31 UTC
21
points
10
comments
4
min read
LW
link
ejenner’s Shortform
Erik Jenner
28 Jul 2020 10:42 UTC
2
points
27
comments
1
min read
LW
link
Back to top
Next