You already touch on this some, but do you imagine this perspective allowing you, at least ideally, to create a “complete” filter in the sense that the filtering process would be capable to catching all unsafe and unaligned AI? If so, what are the criteria under which you might be able to achieve that and if not I’m curious what predictable gaps you expect your filter to have?
(I think you’ve already given a partial answer in your post, but given the way you set up this post with talk about the filter it made me curious to understand what you explicitly think this aspect of it.)
I guess I’m imagining transparency tools that combine to say “OK”, “dangerous”, or “don’t know”, and the question is how often it has to answer “don’t know”. Given that analysis tools typically only work for certain types of systems, and ML training takes many forms, I suppose you’ll need to take some pains to ensure that your system is compatible with existing transparency tools. But I haven’t explicitly thought about this very much, and am just giving a quick answer.
You already touch on this some, but do you imagine this perspective allowing you, at least ideally, to create a “complete” filter in the sense that the filtering process would be capable to catching all unsafe and unaligned AI? If so, what are the criteria under which you might be able to achieve that and if not I’m curious what predictable gaps you expect your filter to have?
(I think you’ve already given a partial answer in your post, but given the way you set up this post with talk about the filter it made me curious to understand what you explicitly think this aspect of it.)
I guess I’m imagining transparency tools that combine to say “OK”, “dangerous”, or “don’t know”, and the question is how often it has to answer “don’t know”. Given that analysis tools typically only work for certain types of systems, and ML training takes many forms, I suppose you’ll need to take some pains to ensure that your system is compatible with existing transparency tools. But I haven’t explicitly thought about this very much, and am just giving a quick answer.
Or at least not in a recognisably relevant-to-your-question way.