Provide feedback on Open Philanthropy’s AI alignment RFP

20 Aug 2021 19:52 UTC

LW: 56 AF: 24

Open Philanthropy is planning a request for proposals (RFP) for AI alignment projects working with deep learning systems, and we’re looking for feedback on the RFP and on the research directions we’re looking for proposals within. We’d be really interested in feedback from people on the Alignment Forum on the current (incomplete) draft of the RFP.

The main RFP text can be viewed here. It links to several documents describing two of the research directions we’re interested in:

Measuring and forecasting risks
Techniques for enhancing human feedback [Edit: this previously linked to an older, incorrect version]

Please feel free to comment either directly on the documents, or in the comments section below.

We are unlikely to add or remove research directions at this stage, but we are open to making any other changes, including to the structure of the RFP. We’d be especially interested in getting the Alignment Forum’s feedback on the research directions we present, and on the presentation of our broader views on AI alignment. It’s important to us that our writing about AI alignment is accurate and easy to understand, and that it’s clear how the research we’re proposing relates to our goals of reducing risks from power-seeking systems.

What links here?

Buck's comment on We’re Redwood Research, we do applied alignment research, AMA by Buck (EA Forum; 7 Oct 2021 21:23 UTC; 5 points)

abergal and Nick_Beckstead

20 Aug 2021 19:52 UTC

LW: 56 AF: 24

6 comments1 min readLW link

RyanCarey 22 Aug 2021 23:03 UTC
LW: 14 AF: 8
AF
The implication seems to be that this RFP is for AIS work that is especially focused on DL systems. Is there likely to be a future RFP for AIS research that applies equally well to DL and non-DL systems? Regardless of where my research lands, I imagine a lot of useful and underfunded research fits in the latter category.
- abergal 28 Aug 2021 2:46 UTC
  LW: 3 AF: 2
  AF Parent
  This RFP is an experiment for us, and we don’t yet know if we’ll be doing more of them in the future. I think we’d be open to including research directions we think that are promising that apply equally well to both DL and non-DL systems—I’d be interested in hearing any particular suggestions you have.
  
  (We’d also be happy to fund particular proposals in the research directions we’ve already listed that apply to both DL and non-DL systems, though we will be evaluating them on how well they address the DL-focused challenges we’ve presented.)
  - RyanCarey 28 Aug 2021 8:33 UTC
    LW: 2 AF: 1
    AF Parent
    I imagine you could catch useful work with i) models of AI safety, or ii) analysis of failure modes, or something, though I’m obviously biased here.
Alex Flint 29 Aug 2021 19:01 UTC
LW: 13 AF: 6
AF
Thank you for posting this Asya and Nick. After I read it I realized that it connected to something that I’ve been thinking about for a while that seems like it might actually be a fit for this RFP under research direction 3 or 4 (interpretability, truthful AI). I drafted a very rough 1.5-pager this morning in a way that hopefully connects fairly obviously to what you’ve written above:

https://docs.google.com/document/d/1pEOXIIjEvG8EARHgoxxI54hfII2qfJpKxCqUeqNvb3Q/edit?usp=sharing

Interested in your thoughts.

Feedback from everyone is most welcome, too, of course.
adamShimi 21 Aug 2021 12:31 UTC
LW: 3 AF: 1
AF
Great initiative! I’ll try to leave some comments sometime next week.
Is there a deadline? (I’ve seen floating around the 15th of September, but I guess feedback would be valuable before that so you can take it into account?)
Also, is this the proposal mentioned by Rohin in his last newsletter, or a parallel effort?
- abergal 22 Aug 2021 1:35 UTC
  LW: 2 AF: 1
  AF Parent
  Getting feedback in the next week would be ideal; September 15th will probably be too late.
  
  Different request for proposals!