tristanm comments on Daniel Dewey on MIRI’s Highly Reliable Agent Design Work

tristanm 10 Jul 2017 18:21 UTC
6 points
My summary of the review:
- HRAD is “work that aims to describe basic aspects of reasoning and decision-making in a complete, principled, and theoretically satisfying way”
- Further breaking down HRAD into MIRI’s research topics (Philosophy, decision theory, logical uncertainty, and Vingean reflection).
- MIRI’s position is that even minor mistakes in AI design could have catastrophic effects if these AI systems are very powerful.
- HRAD, if fully complete, would give us a full description of AI systems such that we would be able to feel relatively certain that a given AI system would or would not cause catastrophic harm.
- Daniel agrees that current formalisms to describe reasoning are incomplete or unsatisfying.
- He also agrees that powerful AI systems have the potential to cause serious harm if mistakes are made in their design.
- He agrees that we should have some kind of formalism that tells us whether or not an advanced AI system will be aligned.
- However, Daniel assigns only a 10% chance that MIRI’s work in HRAD will be helpful in understanding current and future AI designs.
- The reasons for this are:
  (1) MIRI’s HRAD work does not seem to be applicable to any current machine learning systems. (2) Mainstream AI researchers haven’t expressed much enthusiasm for MIRI’s HRAD work. (3) Daniel is more enthusiastic about Paul Christiano’s alternative approach and believes academic AI researchers are as well.
- However, he believes MIRI researchers are “thoughtful, aligned with our values, and have a good track record.”
- He believes HRAD is currently funding constrained and somewhat neglected, therefore, if it turns out to be the correct approach, then supporting it now could be very beneficial.
- lifelonglearner 11 Jul 2017 2:01 UTC
  0 points
  Parent
  Thanks for a more in-depth summary!