Gordon Seidoh Worley comments on An Analytic Perspective on AI Alignment

Gordon Seidoh Worley 2 Mar 2020 23:29 UTC
LW: 2 AF: 1
AF
You already touch on this some, but do you imagine this perspective allowing you, at least ideally, to create a “complete” filter in the sense that the filtering process would be capable to catching all unsafe and unaligned AI? If so, what are the criteria under which you might be able to achieve that and if not I’m curious what predictable gaps you expect your filter to have?
(I think you’ve already given a partial answer in your post, but given the way you set up this post with talk about the filter it made me curious to understand what you explicitly think this aspect of it.)
- DanielFilan 3 Mar 2020 7:20 UTC
  LW: 2 AF: 1
  AF Parent
  I guess I’m imagining transparency tools that combine to say “OK”, “dangerous”, or “don’t know”, and the question is how often it has to answer “don’t know”. Given that analysis tools typically only work for certain types of systems, and ML training takes many forms, I suppose you’ll need to take some pains to ensure that your system is compatible with existing transparency tools. But I haven’t explicitly thought about this very much, and am just giving a quick answer.
  - DanielFilan 3 Mar 2020 7:21 UTC
    LW: 2 AF: 1
    AF Parent
    
    But I haven’t explicitly thought about this very much
    
    Or at least not in a recognisably relevant-to-your-question way.