RSS

Leon Lang

Karma: 1,480

I’m a last-year PhD student at the University of Amsterdam working on AI Safety and Alignment, and specifically safety risks of Reinforcement Learning from Human Feedback (RLHF). Previously, I also worked on abstract multivariate information theory and equivariant deep learning. https://​​langleon.github.io/​​

[Paper Blog­post] When Your AIs De­ceive You: Challenges with Par­tial Ob­serv­abil­ity in RLHF

Leon LangOct 22, 2024, 1:57 PM
51 points
2 comments18 min readLW link
(arxiv.org)

We Should Pre­pare for a Larger Rep­re­sen­ta­tion of Academia in AI Safety

Leon LangAug 13, 2023, 6:03 PM
90 points
14 comments5 min readLW link

An­drew Ng wants to have a con­ver­sa­tion about ex­tinc­tion risk from AI

Leon LangJun 5, 2023, 10:29 PM
32 points
2 comments1 min readLW link
(twitter.com)

Eval­u­at­ing Lan­guage Model Be­havi­ours for Shut­down Avoidance in Tex­tual Scenarios

May 16, 2023, 10:53 AM
26 points
0 comments13 min readLW link

[Ap­pendix] Nat­u­ral Ab­strac­tions: Key Claims, The­o­rems, and Critiques

Mar 16, 2023, 4:38 PM
48 points
0 comments13 min readLW link

Nat­u­ral Ab­strac­tions: Key claims, The­o­rems, and Critiques

Mar 16, 2023, 4:37 PM
241 points
23 comments45 min readLW link3 reviews

An­drew Hu­ber­man on How to Op­ti­mize Sleep

Leon LangFeb 2, 2023, 8:17 PM
37 points
6 comments6 min readLW link

Ex­per­i­ment Idea: RL Agents Evad­ing Learned Shutdownability

Leon LangJan 16, 2023, 10:46 PM
31 points
7 comments17 min readLW link
(docs.google.com)

Disen­tan­gling Shard The­ory into Atomic Claims

Leon LangJan 13, 2023, 4:23 AM
86 points
6 comments18 min readLW link

Cita­bil­ity of Less­wrong and the Align­ment Forum

Leon LangJan 8, 2023, 10:12 PM
48 points
2 comments1 min readLW link

A Short Dialogue on the Mean­ing of Re­ward Functions

Nov 19, 2022, 9:04 PM
45 points
0 comments3 min readLW link

Leon Lang’s Shortform

Leon LangOct 2, 2022, 10:05 AM
2 points
60 comments1 min readLW link

Distri­bu­tion Shifts and The Im­por­tance of AI Safety

Leon LangSep 29, 2022, 10:38 PM
17 points
2 comments12 min readLW link

Sum­maries: Align­ment Fun­da­men­tals Curriculum

Leon LangSep 18, 2022, 1:08 PM
44 points
3 comments1 min readLW link
(docs.google.com)