Rafael Harth comments on DeepMind article: AI Safety Gridworlds

Rafael Harth 1 Dec 2017 22:25 UTC
6 points
0
To anyone who feels competent enough to answer this: how should we rate this paper? On a scale from 0 to 10 where 0 is a half-hearted handwaving of the problem to avoid criticism and 10 is a fully genuine and technically solid approach to the problem, where does it fall? Should I feel encouraged that DeepMind will pay more attention to AI risk in the future?
- Vika 19 Jan 2018 16:39 UTC
  5 points
  0
  Parent
  (paper coauthor here) When you ask whether the paper indicates that DeepMind is paying attention to AI risk, are you referring to DeepMind’s leadership, AI safety team, the overall company culture, or something else?
  - Rafael Harth 20 Jan 2018 11:38 UTC
    1 point
    0
    Parent
    I was thinking about the DeepMind leadership when I asked, but I’m also very interested in the overall company culture.
    - Vika 20 Jan 2018 16:04 UTC
      4 points
      0
      Parent
      I think the DeepMind founders care a lot about AI safety (e.g. Shane Legg is a coauthor of the paper). Regarding the overall culture, I would say that the average DeepMind researcher is somewhat more interested in safety than the average ML researcher in general.
- magfrump 4 Dec 2017 2:23 UTC
  5 points
  0
  Parent
  I don’t interpret this as an attempt to make tangible progress on a research question, since it presents an environment and not an algorithm. It’s more like an actual specification of a (very small) subset of problems that are important. Without steps like this I think it’s very clear that alignment problems will NOT get solved—I think they’re probably (~90%) necessary but definitely not (~99.99%) sufficient.
  I think this is well within the domain of problems that are valuable to solve for current ML models and deployments, and not in the domain of constraining superintelligences or even AGI. Because of this I wouldn’t say that this constitutes a strong signal that DeepMind will pay more attention to AI risk in the future.
  I’m also inclined to think that any successful endeavor at friendliness will need both mathematical formalisms for what friendliness is (i.e. MIRI-style work) and technical tools and subtasks for implementing those formalisms (similar to those presented in this paper). So I’d say this paper is tangibly helpful and far from complete regardless of its position within DeepMind or the surrounding research community.