RSS

evhub(Evan Hubinger)

Karma: 12,026

Evan Hubinger (he/​him/​his) (evanjhub@gmail.com)

I am a research scientist at Anthropic where I lead the Alignment Stress-Testing team. My posts and comments are my own and do not represent Anthropic’s positions, policies, strategies, or opinions.

Previously: MIRI, OpenAI

See: “Why I’m joining Anthropic

Selected work:

Depen­dent Type The­ory and Zero-Shot Reasoning

evhub11 Jul 2018 1:16 UTC
27 points
3 comments5 min readLW link

Nuances with as­crip­tion universality

evhub12 Feb 2019 23:38 UTC
20 points
1 comment2 min readLW link

A Con­crete Pro­posal for Ad­ver­sar­ial IDA

evhub26 Mar 2019 19:50 UTC
16 points
5 comments5 min readLW link

Risks from Learned Op­ti­miza­tion: Introduction

31 May 2019 23:44 UTC
184 points
42 comments12 min readLW link3 reviews

Con­di­tions for Mesa-Optimization

1 Jun 2019 20:52 UTC
84 points
48 comments12 min readLW link

The In­ner Align­ment Problem

4 Jun 2019 1:20 UTC
103 points
17 comments13 min readLW link