RSS

Erik Jenner

Karma: 1,815

PhD student in AI safety at CHAI (UC Berkeley)

Ev­i­dence of Learned Look-Ahead in a Chess-Play­ing Neu­ral Network

Erik Jenner4 Jun 2024 15:50 UTC
120 points
14 comments13 min readLW link

Con­crete em­piri­cal re­search pro­jects in mechanis­tic anomaly detection

3 Apr 2024 23:07 UTC
43 points
3 comments10 min readLW link

A gen­tle in­tro­duc­tion to mechanis­tic anomaly detection

Erik Jenner3 Apr 2024 23:06 UTC
71 points
2 comments11 min readLW link

CHAI in­tern­ship ap­pli­ca­tions are open (due Nov 13)

Erik Jenner26 Oct 2023 0:53 UTC
34 points
0 comments3 min readLW link

A com­par­i­son of causal scrub­bing, causal ab­strac­tions, and re­lated methods

8 Jun 2023 23:40 UTC
73 points
3 comments22 min readLW link

[Ap­pendix] Nat­u­ral Ab­strac­tions: Key Claims, The­o­rems, and Critiques

16 Mar 2023 16:38 UTC
48 points
0 comments13 min readLW link

Nat­u­ral Ab­strac­tions: Key claims, The­o­rems, and Critiques

16 Mar 2023 16:37 UTC
237 points
23 comments45 min readLW link3 reviews

Syd­ney can play chess and kind of keep track of the board state

Erik Jenner3 Mar 2023 9:39 UTC
64 points
19 comments6 min readLW link

Re­search agenda: For­mal­iz­ing ab­strac­tions of computations

Erik Jenner2 Feb 2023 4:29 UTC
92 points
10 comments31 min readLW link

Ab­strac­tions as mor­phisms be­tween (co)algebras

Erik Jenner14 Jan 2023 1:51 UTC
17 points
1 comment8 min readLW link

Sub­sets and quo­tients in interpretability

Erik Jenner2 Dec 2022 23:13 UTC
26 points
1 comment7 min readLW link

ARC pa­per: For­mal­iz­ing the pre­sump­tion of independence

Erik Jenner20 Nov 2022 1:22 UTC
97 points
2 comments2 min readLW link
(arxiv.org)

Re­sponse to Katja Grace’s AI x-risk counterarguments

19 Oct 2022 1:17 UTC
77 points
18 comments15 min readLW link

Disen­tan­gling in­ner al­ign­ment failures

Erik Jenner10 Oct 2022 18:50 UTC
23 points
5 comments4 min readLW link

Good on­tolo­gies in­duce com­mu­ta­tive diagrams

Erik Jenner9 Oct 2022 0:06 UTC
49 points
5 comments14 min readLW link

How are you deal­ing with on­tol­ogy iden­ti­fi­ca­tion?

Erik Jenner4 Oct 2022 23:28 UTC
34 points
10 comments3 min readLW link

Break­ing down the train­ing/​de­ploy­ment dichotomy

Erik Jenner28 Aug 2022 21:45 UTC
30 points
3 comments3 min readLW link

Re­ward model hack­ing as a challenge for re­ward learning

Erik Jenner12 Apr 2022 9:39 UTC
25 points
1 comment9 min readLW link

The (not so) para­dox­i­cal asym­me­try be­tween po­si­tion and momentum

Erik Jenner28 Mar 2021 13:31 UTC
21 points
10 comments4 min readLW link

ejen­ner’s Shortform

Erik Jenner28 Jul 2020 10:42 UTC
2 points
27 comments1 min readLW link