Leaving MIRI, Seeking Funding

abramdemskiAug 8, 2024, 6:32 PM

264 points

This is slightly old news at this point, but: as part of MIRI’s recent strategy pivot, they’ve eliminated the Agent Foundations research team. I’ve been out of a job for a little over a month now. Much of my research time in the first half of the year was eaten up by engaging with the decision process that resulted in this, and later, applying to grants and looking for jobs.

I haven’t secured funding yet, but for my own sanity & happiness, I am (mostly) taking a break from worrying about that, and getting back to thinking about the most important things.

However, in an effort to try the obvious, I have set up a Patreon where you can fund my work directly. I don’t expect it to become my main source of income, but if it does, that could be a pretty good scenario for me; it would be much nicer to get money directly from a bunch of people who think my work is good and important, as opposed to try to justify my work regularly in grant applications.

What I’m (probably) Doing Going Forward

I’ve been told by several people within MIRI and outside of MIRI that it seems better for me to do roughly what I’ve been doing, rather than pivot to something else. As such, I mainly expect to continue doing Agent Foundations research.

I think of my main research program as the Tiling Agents program. You can think of this as the question of when agents will preserve certain desirable properties (such as safety-relevant properties) when given the opportunity to self-modify. Another way to think about it is the slightly broader question: when can one intelligence trust another? The bottleneck for avoiding harmful self-modifications is self-trust; so getting tiling results is mainly a matter of finding conditions for trust.

The search for tiling results has two main motivations:

AI-AI tiling, for the purpose of finding conditions under which AI systems will want to preserve safety-relevant properties.
Human-AI tiling, for the purpose of understanding when we can justifiably trust AI systems.

While I see this as the biggest priority, I also expect to continue a broader project of deconfusion. The bottleneck to progress in AI safety continues to be our confusion about many of the relevant concepts, such as human values.

I’m also still interested in doing some work on accelerating AI safety research using modern AI.

Thoughts on Public vs Private Research

Some work that is worth doing should be done in a non-public, or even highly secretive, way.^[1] However, my experience at MIRI has given me a somewhat burned-out feeling about doing highly secretive work. It is hard to see how secretive work can have a positive impact on the future (although the story for public work is also fraught). At MIRI, there was always the idea that if we came up with something sufficiently good, something would happen… although what exactly was unclear, at least to me.

Secretive research also lacks feedback loops that public research has. My impression is that this slows down the research significantly (contrary to some views at MIRI).

In any case, I personally hope to make my research more open and accessible going forward, although this may depend on my future employer. This means writing more on LessWrong and the Alignment Forum, and perhaps writing academic papers.

As part of this, I hope to hold more of my research video calls as publicly-accessible discussions. I’ve been experimenting with this a little bit and I feel it has been going well so far.

If you’d like to fund my work directly, you can do so via Patreon.

^
Roughly, I mean dangerous AI capabilities work, although the “capabilities vs safety” dichotomy is somewhat fraught.

What links here?

abramdemskiAug 8, 2024, 6:32 PM

264 points

19 comments2 min readLW link

Tiling Agents Agent Foundations AI

Neel Nanda Aug 9, 2024, 1:17 AM
41 points
19

Suggestion: You may want to make a Manifund application in addition to the Patreon, so that donors who are US taxpayers can donate in a tax deductible way
Ben Pace Aug 9, 2024, 5:57 PM
38 points
20

I really think you’re supposed to have options for crazy high levels of support. You should have one for $10k/month, where you’re like “You have truly achieved Robust Delegation. You have supported a full alignment researcher’s annual salary, and are substantially responsible for his research outputs”.
Nobody will press the button that isn’t there. But (as I learned during Manifest and LessOnline) if you give people a button that gives you way more money than you’re expecting, people do sometimes press it.
- kave Aug 9, 2024, 6:52 PM
  13 points
  14
  Parent
  
  Abram, you would need 100s of supporters at your current highest level for you to make an annual salary. That seems like quite a lot given the size of our community. But I do think there’s an appreciable chance some people would support you much more than that level.
  - abramdemski Aug 9, 2024, 7:25 PM
    3 points
    0
    Parent
    
    It is possible to manually adjust the number when signing up. But, point taken.
    - Ben Pace Aug 10, 2024, 12:49 AM
      57 points
      23
      Parent
      
      Seeing that someone promptly signed up for your $500 tier, I hereby recurse on my proposal for a higher tier.
      I took the $50 tier in part because it was the highest, and I expect the $500-person did the same.
      Suggested heuristic: when someone buys your highest tier, you should make sure that there is a new highest tier.
      Also, right now when someone looks at your patreon membership tiers, they only see $5, $10, $20, and a subtle arrow for more. I’d recommend removing the many lower tiers so that the first page shows $50, $200, $500, and an arrow for $1,000 and $5,000.
      - Eli Tyre Aug 10, 2024, 6:32 AM
        6 points
        3
        Parent
        
        Also, right now when someone looks at your patreon membership tiers, they only see $5, $10, $20, and a subtle arrow for more. I’d recommend removing the many lower tiers so that the first page shows $50, $200, $500, and an arrow for $1,000 and $5,000.
        I heartily second Ben here.
        Shmi Aug 11, 2024, 7:38 AM
        4 points
        0
        Parent
        
        As Patrick McKenzie has been saying for almost 20 years, “you can probably stand to charge more”.
Steven Byrnes Aug 8, 2024, 8:27 PM
37 points
18

I just signed up for the Patreon and encourage others to do the same! Abram has done a lot of good work over the years—I’ve learned a lot of important things, things that affect my own research and thinking about AI alignment, by reading his writing.
Raemon Aug 8, 2024, 6:45 PM
12 points
11

I recommend making tiers for the Patreon with fun names even if they don’t do anything – I’ve found this to make a big difference in fundraising. (I agree with tier-prizes often is more distracting than helpful).
- abramdemski Aug 8, 2024, 6:52 PM
  4 points
  0
  Parent
  
  Hmm, any fun name suggestions?
  - Mateusz Bagiński Aug 8, 2024, 7:00 PM
    32 points
    0
    Parent
    
    Radical probabilist
    Paperclip minimizer
    Child of LDT
    Dragon logician
    Embedded agent
    Hufflepuff cynic
    Logical inductor
    Bayesian tyrant
- Caleb Biddulph Aug 9, 2024, 6:47 PM
  1 point
  0
  Parent
  
  It can also be fun to include prizes that are extremely low-commitment or obviously jokes/unlikely to ever be followed up on. Like “a place in my court when I ascend to kinghood” from Alexander Wales’ Patreon
Bird Concept Aug 12, 2024, 6:07 PM
6 points
2

FYI: I skimmed the post quickly and didn’t realize there was a Patreon!
If you wanted to change that, you might want to put it at the very end of the post, on a new line, saying something like: “If you’d like to fund my work directly, you can do so via Patreon [here](link).”
- abramdemski Aug 12, 2024, 6:33 PM
  2 points
  0
  Parent
  
  Edited.
Ben Millwood Aug 9, 2024, 10:01 AM
4 points
2

I heard on the grapevine (this PirateSoftware YouTube Short) that Ko-fi offers a similar service to Patreon but cheaper, curious if you prefer Patreon or just weren’t aware of Ko-fi

edit: I think the prices in the short are not accurate (maybe outdated?) but I’d guess it still works out cheaper
- Yoav Ravid Aug 10, 2024, 5:33 AM
  4 points
  0
  Parent
  
  from their FAQ
  How much does it cost? Unlike pretty much everyone else, we don’t take a cut from your donations! Premium features like Memberships, Ko-fi Shop and Commissions can either be paid for via a small subscription to Ko-fi Gold or a low 5% transaction fee. You decide.
  How do I get paid? Instantly and directly into your PayPal or Stripe account. We take 0-5% fees from donations and we don’t hold onto your money. It goes directly from your supporter to you. Simple!
  Their fees do seem lower than other services, but I think other services can pay directly to your bank account, so you don’t have to pay PayPal or Stripe fees.
Nathan Helm-Burger Aug 8, 2024, 9:09 PM
3 points
0

Just for clarification, does it make sense to interpret “tiling” in the sense you are using to mean something akin to “high fidelity copying”?
- abramdemski Aug 9, 2024, 4:05 PM
  8 points
  0
  Parent
  
  Mostly, but not necessarily. The preservation of some properties, not all or most properties. One could imagine the AI preserving the safety-relevant aspects but radically changing everything else.
  I also worry that ‘high fidelity copying’ connotes some outside system doing the copying, which would miss the point entirely. The difficulty of tiling isn’t about the difficulty of copying; the central difficulty is about trusting something as intelligent or more intelligent than yourself; trusting something which you can’t predict in detail, and therefore have to trust on general principles (such as understanding its goals).
  - Nathan Helm-Burger Aug 9, 2024, 7:00 PM
    4 points
    1
    Parent
    
    So, maybe “selective robust alignment-preserving reproduction” as a propertyof your aligned agent (which may be smarter than you, and may create agents smarter than itself)
- [ ]
  
  [deleted]
Review Bot Aug 8, 2024, 9:32 PM
1 point
0

The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2025. The top fifty or so posts are featured prominently on the site throughout the year.
Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?