Wei Dai comments on IRL in General Environments

Wei Dai 25 Jul 2019 7:03 UTC
LW: 19 AF: 7
0
AF

Sadly I don’t have any recommendations.

This seems like a strange state of affairs. If he thinks there’s an important problem to be solved, and he has a unique perspective on what solving that problem involves, why hasn’t he produced a paper or blog post or talk to explain what that perspective is? Is he expecting to solve the problem all by himself? Can you share your model of what’s going on?

That’s… hard to answer. I feel like most graduate students at CHAI have a somewhat different opinion of what causes AI risk / what needs to be done to solve it, such that everyone is working on a “different problem”.

Same question here. Aside from yourself, the other CHAI grad students don’t seem to have written up their perspectives of what needs to be done about AI risk. Are they content to just each work on their own version of the problem? Are they trying to work out among themselves which “different problem” is the real one?

Maybe one reason to not write up one’s own “different problem” is that one doesn’t expect to be able to convince anyone else to work on it or to receive useful feedback. If that’s the main reason, I argue that it’s still important to write it up in order to provide information to funders, strategists and policy makers about how much disagreement there is among AI safety researchers, and how much resources are need to “cover all the bases” in technical AI safety research. If this seems like a reasonable argument, maybe you could help convey it to your professors and fellow students?
- Rohin Shah 25 Jul 2019 17:50 UTC
  LW: 12 AF: 6
  0
  AF Parent
  This seems like a strange state of affairs. If he thinks there’s an important problem to be solved, and he has a unique perspective on what solving that problem involves, why hasn’t he produced a paper or blog post or talk to explain what that perspective is? Is he expecting to solve the problem all by himself? Can you share your model of what’s going on?
  I mean, he has, see Research Priorities for Robust and Beneficial Artificial Intelligence, and the articles you quote. What he hasn’t done is a) read the counterarguments from LessWrongers and b) responded to those counterarguments in particular. When I say I don’t have any recommendations, I mean I don’t have any recommendations of writing that give responses to typical LessWrong counterarguments.
  My model is very simple—he’s very busy and LessWrongers are at best a small fraction of the people he’s trying to coordinate with, so writing up a response is not worth his time.
  For a perhaps easier-to-relate-to example, this is approximately my model for why Eliezer doesn’t respond to critiques of his arguments (1, 2).
  Another example: the actual view I wanted to get across with the Value Learning sequence is Chapter 3. Chapters 1 and 2, and parts of Chapter 3, were primarily written in anticipation of counterarguments from LessWrongers, and made the Value Learning sequence require significantly more effort on my part.
  Same question here. Aside from yourself, the other CHAI grad students don’t seem to have written up their perspectives of what needs to be done about AI risk. Are they content to just each work on their own version of the problem? Are they trying to work out among themselves which “different problem” is the real one?
  There is Mechanistic Transparency. But overall I agree that there aren’t many such writeups. I think there’s a combination of factors:
  - Expecting a failure to communicate. For example, after I wrote the Value Learning sequence, one of the grad students told me that they learned something from it, because it pinpointed the reason why the argument “the AGI must have a utility function” didn’t work—they already knew that the argument was sketchy, but they couldn’t point at a particular flaw before. If they had tried to write about the reasons for their choice of research, depending on how it was written I’d expect the response from LW would be “but none of this matters; superintelligent AI will be an expected utility maximizer”, and the discussion would stall.
  - Relatedly, not expecting useful feedback because of differing assumptions.
  - Many intuitions about what research is useful to do are not easy to express explicitly. It’s very possible to think that a particular area is worth investigating, without being able to explain exactly why you think it is worth investigating.
  - Some are probably still trying to figure out what they do / don’t believe about AI safety, and so are working on things that other people think are important.
  - Ryan’s point below that writing blog posts on LW is not great for career capital.
  - I’ve also previously sent you an email about why people at CHAI don’t use the Alignment Forum as much; many of those reasons will apply. (Not copying them here because I didn’t ask them for permission to post publicly.)
  - Wei Dai 25 Jul 2019 19:20 UTC
    LW: 7 AF: 4
    0
    AF Parent
    
    I mean, he has, see Research Priorities for Robust and Beneficial Artificial Intelligence,
    
    Thanks for this reference, but it’s co-authored with Daniel Dewey and Max Tegmark and seems to serve as an overview of AI safety research agendas that existed in 2015 rather than Stuart Russell’s personal research priorities. (It actually seems to cite MIRI and Bostrom more than anyone else.)
    
    and the articles you quote.
    
    The ones I looked at all seemed to be written at a very high level for a general (not even ML/AI researchers) audience (and as you noted seem to be overly simplified compared to his actual views). What is the best reference for explaining his personal view of AI risk/safety? I’m happy to read something that’s written for a non-LW research audience.
    
    (EDIT: Removed part about grad students, as it seems more understandable at this point for them to not have written up their views yet.)
    What links here?
    Rohin Shah's comment on Thoughts on “Human-Compatible” by TurnTrout (11 Oct 2019 2:30 UTC; 12 points)
    Wei Dai's comment on Thoughts on “Human-Compatible” by TurnTrout (10 Oct 2019 19:59 UTC; 10 points)
    - Rohin Shah 25 Jul 2019 20:30 UTC
      LW: 10 AF: 5
      0
      AF Parent
      seems to serve as an overview of AI safety research agendas that existed in 2015 rather than Stuart Russell’s personal research priorities.
      Fair point (I just skimmed it again, I last read it over a year ago). In that case I don’t think there is such a reference, which I agree is confusing. He is working on a book about AI safety that is supposed to be published soon, but I don’t know any details about it.
- RyanCarey 25 Jul 2019 9:22 UTC
  LW: 8 AF: 3
  0
  AF Parent
  Aside from yourself, the other CHAI grad students don’t seem to have written up their perspectives of what needs to be done about AI risk. Are they content to just each work on their own version of the problem?
  I think this is actually pretty strategically reasonable.
  CHAI students would have high returns to their probability of attaining a top professorship by writing papers, which is quite beneficial for later recruiting top talent to work on AI safety, and quite structurally beneficial for the establishment of AI safety as a field of research. The time they might spend writing up their research strategy does not help with their this, nor with recruiting help with their line of work (because other nearby researchers face similar pressures, and because academia is not structured to have PhD students lead large teams).
  Moreover, if they are pursuing academic success, they face strong incentives to work on particular problems, and so their research strategies may be somewhat distorted by these incentives, decreasing the quality of a research agenda written in that context.
  When I look at CHAI research students, I see some pursuing IRL, some pursuing game theory, some pursuing the research areas of their supervisors (all of which could lead to professorships), and some pursuing projects of other research leaders like MIRI or Paul. This seems healthy to me.