RSS

Leon Lang

Karma: 1,375

I’m a PhD student at the University of Amsterdam. I have research experience in multivariate information theory and equivariant deep learning and recently got very interested into AI alignment. https://​​langleon.github.io/​​

[Paper Blog­post] When Your AIs De­ceive You: Challenges with Par­tial Ob­serv­abil­ity in RLHF

Leon Lang22 Oct 2024 13:57 UTC
47 points
0 comments18 min readLW link
(arxiv.org)

We Should Pre­pare for a Larger Rep­re­sen­ta­tion of Academia in AI Safety

Leon Lang13 Aug 2023 18:03 UTC
90 points
13 comments5 min readLW link

An­drew Ng wants to have a con­ver­sa­tion about ex­tinc­tion risk from AI

Leon Lang5 Jun 2023 22:29 UTC
32 points
2 comments1 min readLW link
(twitter.com)

Eval­u­at­ing Lan­guage Model Be­havi­ours for Shut­down Avoidance in Tex­tual Scenarios

16 May 2023 10:53 UTC
26 points
0 comments13 min readLW link

[Ap­pendix] Nat­u­ral Ab­strac­tions: Key Claims, The­o­rems, and Critiques

16 Mar 2023 16:38 UTC
48 points
0 comments13 min readLW link

Nat­u­ral Ab­strac­tions: Key claims, The­o­rems, and Critiques

16 Mar 2023 16:37 UTC
228 points
20 comments45 min readLW link

An­drew Hu­ber­man on How to Op­ti­mize Sleep

Leon Lang2 Feb 2023 20:17 UTC
37 points
6 comments6 min readLW link

Ex­per­i­ment Idea: RL Agents Evad­ing Learned Shutdownability

Leon Lang16 Jan 2023 22:46 UTC
31 points
7 comments17 min readLW link
(docs.google.com)

Disen­tan­gling Shard The­ory into Atomic Claims

Leon Lang13 Jan 2023 4:23 UTC
86 points
6 comments18 min readLW link

Cita­bil­ity of Less­wrong and the Align­ment Forum

Leon Lang8 Jan 2023 22:12 UTC
48 points
2 comments1 min readLW link

A Short Dialogue on the Mean­ing of Re­ward Functions

19 Nov 2022 21:04 UTC
45 points
0 comments3 min readLW link

Leon Lang’s Shortform

Leon Lang2 Oct 2022 10:05 UTC
2 points
55 comments1 min readLW link

Distri­bu­tion Shifts and The Im­por­tance of AI Safety

Leon Lang29 Sep 2022 22:38 UTC
17 points
2 comments12 min readLW link

Sum­maries: Align­ment Fun­da­men­tals Curriculum

Leon Lang18 Sep 2022 13:08 UTC
44 points
3 comments1 min readLW link
(docs.google.com)