Leaving MIRI, Seeking Funding

This is slightly old news at this point, but: as part of MIRI’s recent strategy pivot, they’ve eliminated the Agent Foundations research team. I’ve been out of a job for a little over a month now. Much of my research time in the first half of the year was eaten up by engaging with the decision process that resulted in this, and later, applying to grants and looking for jobs.

I haven’t secured funding yet, but for my own sanity & happiness, I am (mostly) taking a break from worrying about that, and getting back to thinking about the most important things.

However, in an effort to try the obvious, I have set up a Patreon where you can fund my work directly. I don’t expect it to become my main source of income, but if it does, that could be a pretty good scenario for me; it would be much nicer to get money directly from a bunch of people who think my work is good and important, as opposed to try to justify my work regularly in grant applications.

What I’m (probably) Doing Going Forward

I’ve been told by several people within MIRI and outside of MIRI that it seems better for me to do roughly what I’ve been doing, rather than pivot to something else. As such, I mainly expect to continue doing Agent Foundations research.

I think of my main research program as the Tiling Agents program. You can think of this as the question of when agents will preserve certain desirable properties (such as safety-relevant properties) when given the opportunity to self-modify. Another way to think about it is the slightly broader question: when can one intelligence trust another? The bottleneck for avoiding harmful self-modifications is self-trust; so getting tiling results is mainly a matter of finding conditions for trust.

The search for tiling results has two main motivations:

  • AI-AI tiling, for the purpose of finding conditions under which AI systems will want to preserve safety-relevant properties.

  • Human-AI tiling, for the purpose of understanding when we can justifiably trust AI systems.

While I see this as the biggest priority, I also expect to continue a broader project of deconfusion. The bottleneck to progress in AI safety continues to be our confusion about many of the relevant concepts, such as human values.

I’m also still interested in doing some work on accelerating AI safety research using modern AI.

Thoughts on Public vs Private Research

Some work that is worth doing should be done in a non-public, or even highly secretive, way.[1] However, my experience at MIRI has given me a somewhat burned-out feeling about doing highly secretive work. It is hard to see how secretive work can have a positive impact on the future (although the story for public work is also fraught). At MIRI, there was always the idea that if we came up with something sufficiently good, something would happen… although what exactly was unclear, at least to me.

Secretive research also lacks feedback loops that public research has. My impression is that this slows down the research significantly (contrary to some views at MIRI).

In any case, I personally hope to make my research more open and accessible going forward, although this may depend on my future employer. This means writing more on LessWrong and the Alignment Forum, and perhaps writing academic papers.

As part of this, I hope to hold more of my research video calls as publicly-accessible discussions. I’ve been experimenting with this a little bit and I feel it has been going well so far.


If you’d like to fund my work directly, you can do so via Patreon.

  1. ^

    Roughly, I mean dangerous AI capabilities work, although the “capabilities vs safety” dichotomy is somewhat fraught.