Rohin Shah comments on An Analytic Perspective on AI Alignment

Rohin Shah 22 Mar 2020 2:35 UTC
LW: 2 AF: 2
0
AF
For what it’s worth, I really dislike this terminology. Of course saying “I want X” is normative, and of course it’s based on empirical beliefs.
Here are two claims:
- “If I were in charge of the world, I would ensure that no powerful AI system were deployed unless we had mechanistic transparency into that system, because anything short of that is an unacceptable level of risk”
- “I think that we should push for mechanistic transparency, because by doing so we will cause developers not to deploy dangerous AI systems, because they will use mechanistic transparency techniques to identify when the AI system is dangerous”
There is an axis on which these two claims differ, where I would say the first one is normative and the second one is empirical. The phrase “perfect is the enemy of good” is also talking about this axis. What would you name that axis?
In any case, probably at this point you know what I mean. I would like to see more argumentation for the second kind of claim, and am trying to say that arguments for the first kind of claim are not likely to sway me.
Re: clarification of desideratum, that makes sense.
- DanielFilan 23 Mar 2020 22:35 UTC
  LW: 2 AF: 1
  0
  AF Parent
  Re: the two claims, that’s different from what I thought you meant by the distinction. I would describe both dot points as being normative claims buttressed by empirical claims. To the extent that I see a difference, it’s that the first dot point is perhaps addressing low-probability risks, while the second is addressing medium-to-high-probability risks. I think that pushing for mechanistic transparency would address medium-to-high-probability risks, but don’t argue for that here, since I think the arguments for medium-to-high-probability risk from AI are better made elsewhere.
  - Rohin Shah 24 Mar 2020 1:19 UTC
    LW: 2 AF: 2
    0
    AF Parent
    Hmm, I was more pointing at the distinction where the first claim doesn’t need to argue for the subclaim “we will be able to get people to use mechanistic transparency” (it’s assumed away by “if I were in charge of the world”), while the second claim does have to argue for it.
    - DanielFilan 27 Mar 2020 23:50 UTC
      LW: 2 AF: 1
      0
      AF Parent
      
      I am mostly interested in allowing the developers of AI systems to determine whether their system has the cognitive ability to cause human extinction, and whether their system might try to cause human extinction.
      
      The way I read this, if the research community enables the developers to determine these things at prohibitive cost, then we mostly haven’t “allowed” them to do it, but if the cost is manageable then we have. So I’d say my desiderata here (and also in my head) include the cost being manageable. If the cost of any such approach were necessarily prohibitive, I wouldn’t be very excited about it.