RSS

TurnTrout

Karma: 20,201

My name is Alex Turner. I’m a research scientist at Google DeepMind on the Scalable Alignment team. My views are strictly my own; I do not represent Google. Reach me at alex[at]turntrout.com

Self-fulfilling mis­al­ign­ment data might be poi­son­ing our AI models

TurnTroutMar 2, 2025, 7:51 PM
149 points
22 comments1 min readLW link
(turntrout.com)

Steer­ing Gem­ini with BiDPO

TurnTroutJan 31, 2025, 2:37 AM
104 points
5 comments1 min readLW link
(turntrout.com)

In­sights from “The Manga Guide to Phys­iol­ogy”

TurnTroutJan 24, 2025, 5:18 AM
26 points
3 comments1 min readLW link
(turntrout.com)

De­cep­tive Align­ment and Homuncularity

Jan 16, 2025, 1:55 PM
25 points
12 comments22 min readLW link

Gam­ing Truth­fulQA: Sim­ple Heuris­tics Ex­posed Dataset Weaknesses

TurnTroutJan 16, 2025, 2:14 AM
64 points
3 comments1 min readLW link
(turntrout.com)