Wei Dai comments on Non-Adversarial Goodhart and AI Risks

Wei Dai 28 Mar 2018 5:41 UTC
11 points
This class of problem does not seem to be addressed by much of the AI-Risk approaches that are currently being suggested or developed.
I think these problems are implicitly addressed in both MIRI’s and Paul Christiano’s approaches? I guess what we want to end up with is an AI that understands the potential for Causal Goodhart and takes that risk into account when making decisions (and e.g. act conservatively when the risk is high). This seems well within the scope of what a good decision theory would do (in MIRI’s approach) or what a large-scale human deliberation would consider (in Paul’s approach).
- Davidmanheim 28 Mar 2018 16:45 UTC
  3 points
  Parent
  I don’t think it does get implicitly addressed, but I’ve been working on things unrelated to these agendas for a while, and have not focused on what else is being done at MIRI. I’d be happy to be more specifically corrected. (And I’m not as familiar with Paul’s approach at all.)
  From what I understand, within the decision theory agenda of MIRI, I think there is an additional hard problem of correctly identifying causality while minimally interfering in the system, and I don’t think it’s on anyone’s agenda—again, I may be wrong.
  If I understand the results on the topic correctly, there’s another problem: discovering causality is NP, and I don’t think there’s an obvious way to try safely guessing without determining the answer. I’m unsure that this is as fundamental an issue, or as critical. It may be that AI can safely avoid the problem by identifying areas where causality is not understood, and being extra careful there.
  For naturalized induction, I have not followed the work in detail at all, but in Nate Soares MIRI technical agenda from a couple years back, he says that Solomoff induction solves the problem for agents outside of a system. For the reasons I outlined above, AIXI/Solomonoff induction is an unsafe utility maximizer if the causal chain is incorrect. To state this differently, the simplest hypothesis that predicts the data is generally predictive in the future, but under regime changes do to causal interaction with the system, that claim seems false. (Yes, this is a significant claim, and should be justified.)
  - LawrenceC 28 Mar 2018 19:05 UTC
    2 points
    Parent
    I’m pretty sure that “hard problem of correctly identifying causality” is a major goal of MIRI’s decision theory.
    In what sense is discovering causality NP-hard? There’s the trivial sense in which you can embed a NP-hard problem (or tasks of higher complexity) into the real world, and there’s the sense in which inference in Bayesian networks can embed NP-hard problems.
    Can you elaborate on why AIXI/Solomonoff induction is an unsafe utility maximizer, even for Cartesian agents?
    - Davidmanheim 30 Mar 2018 15:57 UTC
      1 point
      Parent
      I will try to edit this to include a more comprehensive reply later, but as this will take me at least another week, I will point to one paper I am already familiar with on hardness of decisions where causality is unclear; https://arxiv.org/pdf/1702.06385.pdf (Again, computational complexity is not my area of expertise—so I may be wrong.)
      Re: safety of Solomonoff/AIXI, I am again unsure, but I think we can posit a situation where very early on in the world-model building process, the simpler models, those which are weighted heavily due to simplicity, are incorrect in ways that lead to very dangerous information collection options.
      Apologies for not responding more fully—this is an area where I have a non-technical understanding of the area, but came to tentative conclusions on these points, and have had discussions with those more knowledgable than myself who agreed.