It seems like something in the cryptography literature on program obfuscation could be relevant here. It’s been proven (by Barak et al. 2001) that black-box obfuscation— defined as transforming a program s.t. nothing can be efficiently computed with access to the obfuscated source code that can’t also be efficiently computed with black box access— is impossible for general Turing machines, and there’s various extensions of this result for “approximate” obfuscation and for more narrow classes of programs.
On the other hand there are strong results demonstrating that indistinguishability obfuscationis possible (though currently known implementations of it are prohibitively expensive), and surprisingly often is in fact all you need for practical obfuscation applications. I’m pretty sure that would be enough to defeat Interpretability (though you should consult a cryptographer). Interestingly, indistinguishability obfuscation seems to be a sort of Turing-complete cryptographic primitive: every other cryptographic primitive seems to be constructible using just it.
It seems like something in the cryptography literature on program obfuscation could be relevant here. It’s been proven (by Barak et al. 2001) that black-box obfuscation— defined as transforming a program s.t. nothing can be efficiently computed with access to the obfuscated source code that can’t also be efficiently computed with black box access— is impossible for general Turing machines, and there’s various extensions of this result for “approximate” obfuscation and for more narrow classes of programs.
On the other hand there are strong results demonstrating that indistinguishability obfuscation is possible (though currently known implementations of it are prohibitively expensive), and surprisingly often is in fact all you need for practical obfuscation applications. I’m pretty sure that would be enough to defeat Interpretability (though you should consult a cryptographer). Interestingly, indistinguishability obfuscation seems to be a sort of Turing-complete cryptographic primitive: every other cryptographic primitive seems to be constructible using just it.