RSS

Erik Jenner

Karma: 2,043

Research Scientist on the Google DeepMind AGI Safety & Alignment team

Ev­i­dence of Learned Look-Ahead in a Chess-Play­ing Neu­ral Network

Erik JennerJun 4, 2024, 3:50 PM
121 points
14 comments13 min readLW link

Con­crete em­piri­cal re­search pro­jects in mechanis­tic anomaly detection

Apr 3, 2024, 11:07 PM
43 points
3 comments10 min readLW link

A gen­tle in­tro­duc­tion to mechanis­tic anomaly detection

Erik JennerApr 3, 2024, 11:06 PM
73 points
2 comments11 min readLW link

CHAI in­tern­ship ap­pli­ca­tions are open (due Nov 13)

Erik JennerOct 26, 2023, 12:53 AM
34 points
0 comments3 min readLW link

A com­par­i­son of causal scrub­bing, causal ab­strac­tions, and re­lated methods

Jun 8, 2023, 11:40 PM
73 points
3 comments22 min readLW link

[Ap­pendix] Nat­u­ral Ab­strac­tions: Key Claims, The­o­rems, and Critiques

Mar 16, 2023, 4:38 PM
48 points
0 comments13 min readLW link

Nat­u­ral Ab­strac­tions: Key claims, The­o­rems, and Critiques

Mar 16, 2023, 4:37 PM
241 points
23 comments45 min readLW link3 reviews

Syd­ney can play chess and kind of keep track of the board state

Erik JennerMar 3, 2023, 9:39 AM
64 points
19 comments6 min readLW link

Re­search agenda: For­mal­iz­ing ab­strac­tions of computations

Erik JennerFeb 2, 2023, 4:29 AM
93 points
10 comments31 min readLW link

Ab­strac­tions as mor­phisms be­tween (co)algebras

Erik JennerJan 14, 2023, 1:51 AM
17 points
1 comment8 min readLW link

Sub­sets and quo­tients in interpretability

Erik JennerDec 2, 2022, 11:13 PM
26 points
1 comment7 min readLW link

ARC pa­per: For­mal­iz­ing the pre­sump­tion of independence

Erik JennerNov 20, 2022, 1:22 AM
97 points
2 comments2 min readLW link
(arxiv.org)

Re­sponse to Katja Grace’s AI x-risk counterarguments

Oct 19, 2022, 1:17 AM
77 points
18 comments15 min readLW link

Disen­tan­gling in­ner al­ign­ment failures

Erik JennerOct 10, 2022, 6:50 PM
23 points
5 comments4 min readLW link

Good on­tolo­gies in­duce com­mu­ta­tive diagrams

Erik JennerOct 9, 2022, 12:06 AM
49 points
5 comments14 min readLW link

How are you deal­ing with on­tol­ogy iden­ti­fi­ca­tion?

Erik JennerOct 4, 2022, 11:28 PM
34 points
10 comments3 min readLW link

Break­ing down the train­ing/​de­ploy­ment dichotomy

Erik JennerAug 28, 2022, 9:45 PM
30 points
3 comments3 min readLW link

Re­ward model hack­ing as a challenge for re­ward learning

Erik JennerApr 12, 2022, 9:39 AM
25 points
1 comment9 min readLW link

The (not so) para­dox­i­cal asym­me­try be­tween po­si­tion and momentum

Erik JennerMar 28, 2021, 1:31 PM
21 points
10 comments4 min readLW link

ejen­ner’s Shortform

Erik JennerJul 28, 2020, 10:42 AM
2 points
35 commentsLW link