RSS

Tomek Korbak

Karma: 729

Senior Research Scientist at UK AISI working on frontier AI safety cases

https://​​tomekkorbak.com/​​

A sketch of an AI con­trol safety case

Jan 30, 2025, 5:28 PM
58 points
0 comments5 min readLW link

Elic­it­ing bad contexts

Jan 24, 2025, 10:39 AM
31 points
7 comments3 min readLW link

Au­toma­tion collapse

Oct 21, 2024, 2:50 PM
72 points
9 comments7 min readLW link

Com­po­si­tional prefer­ence mod­els for al­ign­ing LMs

Tomek KorbakOct 25, 2023, 12:17 PM
18 points
2 comments5 min readLW link

Towards Un­der­stand­ing Sy­co­phancy in Lan­guage Models

Oct 24, 2023, 12:30 AM
66 points
0 comments2 min readLW link
(arxiv.org)

Paper: LLMs trained on “A is B” fail to learn “B is A”

Sep 23, 2023, 7:55 PM
120 points
74 comments4 min readLW link
(arxiv.org)

Paper: On mea­sur­ing situ­a­tional aware­ness in LLMs

Sep 4, 2023, 12:54 PM
109 points
16 comments5 min readLW link
(arxiv.org)

Imi­ta­tion Learn­ing from Lan­guage Feedback

Mar 30, 2023, 2:11 PM
71 points
3 comments10 min readLW link

Pre­train­ing Lan­guage Models with Hu­man Preferences

Feb 21, 2023, 5:57 PM
135 points
20 comments11 min readLW link2 reviews