RSS

evhub

Karma: 13,368

Evan Hubinger (he/​him/​his) (evanjhub@gmail.com)

I am a research scientist at Anthropic where I lead the Alignment Stress-Testing team. My posts and comments are my own and do not represent Anthropic’s positions, policies, strategies, or opinions.

Previously: MIRI, OpenAI

See: “Why I’m joining Anthropic

Selected work:

Align­ment Fak­ing in Large Lan­guage Models

18 Dec 2024 17:19 UTC
419 points
53 comments10 min readLW link

Catas­trophic sab­o­tage as a ma­jor threat model for hu­man-level AI systems

evhub22 Oct 2024 20:57 UTC
91 points
11 comments15 min readLW link

Sab­o­tage Eval­u­a­tions for Fron­tier Models

18 Oct 2024 22:33 UTC
93 points
55 comments6 min readLW link
(assets.anthropic.com)