What I’ll be doing at MIRI
Note: This is a personal post describing my own plans, not a post with actual research content.
Having finished my internship working with Paul Christiano and others at OpenAI, I’ll be moving to doing research at MIRI. I’ve decided to do research at MIRI because I believe MIRI will be the easiest, most convenient place for me to continue doing research in the near future. That being said, there are a couple of particular aspects of what I’ll be doing at MIRI that I think are worth being explicit about.
First, and most importantly, this decision does not represent any substantive change in my beliefs regarding AI safety. In particular, my research continues to be focused around solving inner alignment for amplification. My post on relaxed adversarial training continues to represent a fairly up-to-date form of what I think needs to be done along these lines.
Second, my research will remain public by default. I have discussed with MIRI their decision to make their research non-disclosed-by-default and we agreed that my research agenda is a reasonable exception. I strongly believe in the importance of collaborating with both the AI safety and machine learning communities and thus believe in the need for sharing research. Of course, I also fully believe in the importance of carefully reviewing possible harmful effects from publishing before disclosing results—and will continue to do so with all of my research—though I will attempt to publish anything I don’t believe to pose a meaningful risk.
Third—and this should go without saying—I fully anticipate continuing to collaborate with other researchers at other institutions such as OpenAI, Ought, CHAI, DeepMind, FHI, etc. The task of making AGI safe is a huge endeavor that I fully believe will require the joint work of an entire field. If you are interested in working with me on anything (regarding inner alignment or anything else) please don’t hesitate to send me an email at evanjhub@gmail.com.
- Why I’m joining Anthropic by 5 Jan 2023 1:12 UTC; 121 points) (
- MIRI’s 2019 Fundraiser by 3 Dec 2019 1:16 UTC; 55 points) (
- Evan Hubinger on Inner Alignment, Outer Alignment, and Proposals for Building Safe Advanced AI by 1 Jul 2020 17:30 UTC; 35 points) (
- 19 Dec 2019 8:00 UTC; 21 points) 's comment on 2019 AI Alignment Literature Review and Charity Comparison by (
- MIRI’s 2019 Fundraiser by 7 Dec 2019 0:30 UTC; 19 points) (EA Forum;
- 7 Apr 2020 19:24 UTC; 2 points) 's comment on Core Tag Examples [temporary] by (
Small note: my view of MIRI’s nondisclosed-by-default policy is that if all researchers involved with a research program think it should obviously be public then it should obviously be public, and that doesn’t require a bunch of bureaucracy. I think this while simultaneously predicting that when researchers have a part of themselves that feels uncertain or uneasy about whether their research should be public, they will find that there are large benefits to instituting a nondisclosed-by-default policy. But the policy is there to enable researchers, not to annoy them and make them jump through hoops.
(Caveat: within ML, it’s still rare for risk-based nondisclosure to be treated as a real option, and many social incentives favor publishing-by-default. I want to be very clear that within the context of those incentives, I expect many people to jump to “this seems obviously safe to me” when the evidence doesn’t warrant it. I think it’s important to facilitate an environment where it’s not just OK-on-paper but also socially-hedonic to decide against publishing, and I think that these decisions often warrant serious thought. The aim of MIRI’s disclosure policy is to remove undue pressures to make publication decisions prematurely, not to override researchers’ considered conclusions.)
That you’re working full time on research, have a stable salary, and are in a geographical location conducive to talking with a lot of other thoughtful people who think a lot about these topics, are all very valuable things, and I’m pleased to hear these things are happening for you :-)
On the subject of privacy, I was recently reading a friend’s career plan, who was looking for jobs in AI alignment, and I wrote this:
It’s really great to hear that you’ll continue writing publicly, as I think the stuff you’re doing is important and exciting and there are strong distributed benefits for the broader landscape of people working on AI alignment or who want to.
Also feel free to come downstairs and hang out with us in the LessWrong offices :-)
Congratulations! :)
Do come visit our office in your basement sometimes.
[Meta] At the moment, Oliver’s comment has 15 karma across 1 vote (and 6 AF karma). If I’m understanding LW’s voting system correctly, the only way this could have happened is if Oliver undid his default vote on the comment, and then Eliezer Yudkowsky did a strong-upvote on the comment (see here for a list of users by voting power). But my intuition says this scenario is implausible, so I’m curious what happened instead.
(This isn’t important, but I’m curious anyway.)
Probably not that important for this comment, but pretty important in terms of “is voting currently badly broken.” Thanks for the heads up!
We have some open bug-reports/issues with the vote-counter. When certain race-conditions occur the counter can get out-of-sync, so in this case the counter is just straightforwardly wrong. This occurs pretty rarely, so it hasn’t been at the top of our priority list.
The true number of votes on my comment is 4.