RSS

evhub

Karma: 13,838

Evan Hubinger (he/​him/​his) (evanjhub@gmail.com)

I am a research scientist at Anthropic where I lead the Alignment Stress-Testing team. My posts and comments are my own and do not represent Anthropic’s positions, policies, strategies, or opinions.

Previously: MIRI, OpenAI

See: “Why I’m joining Anthropic

Selected work:

Train­ing on Doc­u­ments About Re­ward Hack­ing In­duces Re­ward Hacking

21 Jan 2025 21:32 UTC
131 points
13 comments2 min readLW link
(alignment.anthropic.com)