RSS

Ethan Perez

Karma: 2,906

I’m a research scientist at Anthropic doing empirical safety research on language models. In the past, I’ve worked on automated red teaming of language models [1], the inverse scaling prize [2], learning from human feedback [3][4], and empirically testing debate [5][6], iterated amplification [7], and other methods [8] for scalably supervising AI systems as they become more capable.

Website: https://​​ethanperez.net/​​

In­verse Scal­ing Prize: Round 1 Winners

Sep 26, 2022, 7:57 PM
93 points
16 comments4 min readLW link
(irmckenzie.co.uk)

We may be able to see sharp left turns coming

Sep 3, 2022, 2:55 AM
54 points
29 comments1 min readLW link

A Test for Lan­guage Model Consciousness

Ethan PerezAug 25, 2022, 7:41 PM
18 points
14 comments9 min readLW link