Rohin Shah comments on [AN #115]: AI safety research problems in the AI-GA framework

Rohin Shah 3 Sep 2020 16:15 UTC
2 points
Uh, I don’t speak for CHAI, and my views differ pretty significantly from e.g. Dylan’s or Stuart’s on several topics. (And other grad students differ even more.) But those seem like reasonable CHAI papers to look at (though I’m not sure how Active IRD relates to corrigibility). Chapter 3 of the Value Learning sequence has some of my takes on reward uncertainty, which probably includes some thoughts about corrigibility somewhere.
Human Compatible also talks about corrigibility iirc, though I think the discussion is pretty similar to the one in the off switch game?
- algon33 3 Sep 2020 17:36 UTC
  1 point
  Parent
  Active IRD doesn’t have anything to do with corrigibility, I guess my mind just switched off when I was writing that. Anyway, how diverse are CHAI’s views on corrigibility? Could you tell me who I should talk to? Because I’ve already read all the published stuff on it if I’m understanding you rightly and I want to make sure that all the perspectives no this topic are covered.
  - Rohin Shah 4 Sep 2020 5:44 UTC
    2 points
    Parent
    Hmm, I expect each grad student will have a slightly different perspective, but off the top of my head I think Michael Dennis has the most opinions on it. (Other people could include Daniel Filan and Adam Gleave.)
    - algon33 4 Sep 2020 12:12 UTC
      1 point
      Parent
      Thanks. Two questions:
      Do the staff and faculty have a similair diversity of opinions?
      Is messaging chai-info@berkeley.edu in orde to contact your peers the right procedure here?
      - Rohin Shah 4 Sep 2020 17:01 UTC
        5 points
        Parent
        Hmm, of the faculty Stuart spends the most time thinking about AI alignment, I’m not sure how much the other faculty have thought about corrigibility—they’ll have views about the off switch game, but not about MIRI-style corrigibility.
        Most of the staff doesn’t work on technical research, so they probably won’t have strong opinions. Exceptions: Critch and Karthika (though I don’t think Karthika has engaged much with corrigibility).
        Probably the best way is to find emails of individual researchers online and email them directly. I’ve also left a message on our Slack linking to this discussion.