search quality: skimmed the abstracts search method: semantic scholar + browsing note that many of these results are kind of old
https://www.semanticscholar.org/paper/Explaining-Neural-Networks-by-Decoding-Layer-Schneider-Vlachos/0de6c8de9154a0db199aa433fc19cdfef2a62076
… is cited by https://www.semanticscholar.org/paper/Toward-Transparent-AI%3A-A-Survey-on-Interpreting-the-Raukur-Ho/108a4000b32e3f6eb566151790bfea69c1f3a9db (fun: it cites the EA forum for one of its 300 cites)
… which cites https://www.semanticscholar.org/paper/Understanding-deep-image-representations-by-them-Mahendran-Vedaldi/4d790c8fae40357d24813d085fa74a436847fb49
… which is heavily cited, eg by https://www.semanticscholar.org/paper/Inverting-Visual-Representations-with-Convolutional-Dosovitskiy-Brox/125f7b539e89cd0940ff89c231902b1d4023b3ba
… https://www.semanticscholar.org/paper/Inverting-face-embeddings-with-convolutional-neural-Zhmoginov-Sandler/e44fc62f9fba4c9ad276544901fd1e82caaf7baa
… https://www.semanticscholar.org/paper/Inverting-Convolutional-Networks-with-Convolutional-Dosovitskiy-Brox/993c55eef970c6a11ec367dbb1bf1f0c1d5d72a6
… hmm interesting, here’s a branch off into doing it on the human visual system apparently https://www.semanticscholar.org/paper/Using-deep-learning-to-reveal-the-neural-code-for-Kindel-Christensen/e79b56303a29114762f458d338d0f3b03348d618
… https://www.semanticscholar.org/paper/Visualizing-and-Comparing-AlexNet-and-VGG-using-Yu-Bai/dae981902b1f6d869ef2d047612b90cdbe43fd1e
… https://www.semanticscholar.org/paper/Understading-Image-Restoration-Convolutional-Neural-Protas-Bratti/0c807815ceaa186e99519f59ae6c3ff1ac7defdd
https://www.semanticscholar.org/paper/Towards-Understanding-the-Invertibility-of-Neural-Gilbert-Zhang/487489253b03948a1b1c581986c086d577222e0a
https://www.semanticscholar.org/paper/Analysis-of-Invariance-and-Robustness-via-of-Behrmann-Dittmer/0c11435e0b97b90dfc3928ce242c68289bc757f2
https://www.semanticscholar.org/paper/Deep-Neural-Networks-are-Surprisingly-Reversible%3A-A-Dong-Yin/e8e5f0db724d65f761bd2d415ee46281f8ba751a
https://www.semanticscholar.org/paper/Large-capacity-Image-Steganography-Based-on-Neural-Lu-Wang/d1485d298906364c4434454d25c0ed4389420892
https://www.semanticscholar.org/paper/Robust-Invertible-Image-Steganography-Xu-Mou/786736d89d5bbfa674fabe42ecec32ed8f67901e
https://www.semanticscholar.org/paper/Understanding-and-mitigating-exploding-inverses-in-Behrmann-Vicol/8c0b75099f577cc009065e985cae6986cf755d4d
https://www.semanticscholar.org/paper/The-Effects-of-Invertibility-on-the-Complexity-of-Pareek-Risteski/7bb65e9167e5d21f04ebaacdd7bc59f7c4972bb7
https://www.semanticscholar.org/paper/Evaluating-generalization-through-interval-based-Adam-Likas/f7843d212ddd65de3dc376bb6c146ce78eacf3e0
https://www.semanticscholar.org/paper/Landscape-Learning-for-Neural-Network-Inversion-Liu-Mao/5dad3748e8d4d8c659005903062e5d8e855fa86c ⇐ bold claims, might even read this one properly to see if they hold up
interesting to me but not what you asked for
https://www.semanticscholar.org/paper/The-learning-phases-in-NN%3A-From-Fitting-the-to-a-Schneider/f0c5f3e254b3146199ae7d8feb888876edc8ec8b https://www.semanticscholar.org/paper/Deceptive-AI-Explanations%3A-Creation-and-Detection-Schneider-Handali/54560c7bce50e57d2396cbf485ff66e5fda83a13 https://www.semanticscholar.org/paper/TopKConv%3A-Increased-Adversarial-Robustness-Through-Eigen-Sadovnik/fd5a74996cc5ef9a6b866cb5608064218d060d16 https://www.semanticscholar.org/paper/This-Looks-Like-That...-Does-it-Shortcomings-of-in-Hoffmann-Fanconi/78396cda15041dda05c5a21c1417683bee2a070b (does this one limit the applicability of “natural abstraction”/”everything’s connected”/relative representations?) https://www.semanticscholar.org/paper/Self-explaining-AI-as-an-Alternative-to-AI-Elton/301c4c7df87f728e2589a384001e2a2755c5072c https://www.semanticscholar.org/paper/Pruning-by-Explaining%3A-A-Novel-Criterion-for-Deep-Yeom-Seegerer/ebbe984d3d7bc7edfe0cda0f1fcf49d1533bc3c3 https://www.semanticscholar.org/paper/Pruning-for-Interpretable%2C-Feature-Preserving-in-Hamblin-Konkle/370ee88bb8207651675a8fa5c93de7de4d79db36 https://www.semanticscholar.org/paper/“Will-You-Find-These-Shortcuts”-A-Protocol-for-the-Bastings-Ebert/efe376f566e5ab6113fe8e215abc7ed5149a3848
https://www.semanticscholar.org/paper/Inducing-Causal-Structure-for-Interpretable-Neural-Geiger-Wu/ccd04c27bf1237368b35eb456b3dd1c18ef9a9b9
https://www.semanticscholar.org/paper/Interpreting-Deep-Learning%3A-The-Machine-Learning-Charles/b7488a0ac799a2c62882a5b40f4ea4b1c88f04c4 https://www.semanticscholar.org/paper/Minimizing-Control-for-Credit-Assignment-with-Meulemans-Farinha/0bb32a1b9a8702a38f54b64ca08df8abffc097a8 https://www.semanticscholar.org/paper/The-Union-of-Manifolds-Hypothesis-and-its-for-Deep-Brown-Caterini/3c0a4afc8f430f32442a8efa306f898d9198d7c5
search quality: skimmed the abstracts search method: semantic scholar + browsing note that many of these results are kind of old
https://www.semanticscholar.org/paper/Explaining-Neural-Networks-by-Decoding-Layer-Schneider-Vlachos/0de6c8de9154a0db199aa433fc19cdfef2a62076
… is cited by https://www.semanticscholar.org/paper/Toward-Transparent-AI%3A-A-Survey-on-Interpreting-the-Raukur-Ho/108a4000b32e3f6eb566151790bfea69c1f3a9db (fun: it cites the EA forum for one of its 300 cites)
… which cites https://www.semanticscholar.org/paper/Understanding-deep-image-representations-by-them-Mahendran-Vedaldi/4d790c8fae40357d24813d085fa74a436847fb49
… which is heavily cited, eg by https://www.semanticscholar.org/paper/Inverting-Visual-Representations-with-Convolutional-Dosovitskiy-Brox/125f7b539e89cd0940ff89c231902b1d4023b3ba
… https://www.semanticscholar.org/paper/Inverting-face-embeddings-with-convolutional-neural-Zhmoginov-Sandler/e44fc62f9fba4c9ad276544901fd1e82caaf7baa
… https://www.semanticscholar.org/paper/Inverting-Convolutional-Networks-with-Convolutional-Dosovitskiy-Brox/993c55eef970c6a11ec367dbb1bf1f0c1d5d72a6
… hmm interesting, here’s a branch off into doing it on the human visual system apparently https://www.semanticscholar.org/paper/Using-deep-learning-to-reveal-the-neural-code-for-Kindel-Christensen/e79b56303a29114762f458d338d0f3b03348d618
… https://www.semanticscholar.org/paper/Visualizing-and-Comparing-AlexNet-and-VGG-using-Yu-Bai/dae981902b1f6d869ef2d047612b90cdbe43fd1e
… https://www.semanticscholar.org/paper/Understading-Image-Restoration-Convolutional-Neural-Protas-Bratti/0c807815ceaa186e99519f59ae6c3ff1ac7defdd
https://www.semanticscholar.org/paper/Towards-Understanding-the-Invertibility-of-Neural-Gilbert-Zhang/487489253b03948a1b1c581986c086d577222e0a
https://www.semanticscholar.org/paper/Analysis-of-Invariance-and-Robustness-via-of-Behrmann-Dittmer/0c11435e0b97b90dfc3928ce242c68289bc757f2
https://www.semanticscholar.org/paper/Deep-Neural-Networks-are-Surprisingly-Reversible%3A-A-Dong-Yin/e8e5f0db724d65f761bd2d415ee46281f8ba751a
https://www.semanticscholar.org/paper/Large-capacity-Image-Steganography-Based-on-Neural-Lu-Wang/d1485d298906364c4434454d25c0ed4389420892
https://www.semanticscholar.org/paper/Robust-Invertible-Image-Steganography-Xu-Mou/786736d89d5bbfa674fabe42ecec32ed8f67901e
https://www.semanticscholar.org/paper/Understanding-and-mitigating-exploding-inverses-in-Behrmann-Vicol/8c0b75099f577cc009065e985cae6986cf755d4d
https://www.semanticscholar.org/paper/The-Effects-of-Invertibility-on-the-Complexity-of-Pareek-Risteski/7bb65e9167e5d21f04ebaacdd7bc59f7c4972bb7
https://www.semanticscholar.org/paper/Evaluating-generalization-through-interval-based-Adam-Likas/f7843d212ddd65de3dc376bb6c146ce78eacf3e0
https://www.semanticscholar.org/paper/Landscape-Learning-for-Neural-Network-Inversion-Liu-Mao/5dad3748e8d4d8c659005903062e5d8e855fa86c ⇐ bold claims, might even read this one properly to see if they hold up
interesting to me but not what you asked for
https://www.semanticscholar.org/paper/The-learning-phases-in-NN%3A-From-Fitting-the-to-a-Schneider/f0c5f3e254b3146199ae7d8feb888876edc8ec8b https://www.semanticscholar.org/paper/Deceptive-AI-Explanations%3A-Creation-and-Detection-Schneider-Handali/54560c7bce50e57d2396cbf485ff66e5fda83a13 https://www.semanticscholar.org/paper/TopKConv%3A-Increased-Adversarial-Robustness-Through-Eigen-Sadovnik/fd5a74996cc5ef9a6b866cb5608064218d060d16 https://www.semanticscholar.org/paper/This-Looks-Like-That...-Does-it-Shortcomings-of-in-Hoffmann-Fanconi/78396cda15041dda05c5a21c1417683bee2a070b (does this one limit the applicability of “natural abstraction”/”everything’s connected”/relative representations?) https://www.semanticscholar.org/paper/Self-explaining-AI-as-an-Alternative-to-AI-Elton/301c4c7df87f728e2589a384001e2a2755c5072c https://www.semanticscholar.org/paper/Pruning-by-Explaining%3A-A-Novel-Criterion-for-Deep-Yeom-Seegerer/ebbe984d3d7bc7edfe0cda0f1fcf49d1533bc3c3 https://www.semanticscholar.org/paper/Pruning-for-Interpretable%2C-Feature-Preserving-in-Hamblin-Konkle/370ee88bb8207651675a8fa5c93de7de4d79db36 https://www.semanticscholar.org/paper/“Will-You-Find-These-Shortcuts”-A-Protocol-for-the-Bastings-Ebert/efe376f566e5ab6113fe8e215abc7ed5149a3848
https://www.semanticscholar.org/paper/Inducing-Causal-Structure-for-Interpretable-Neural-Geiger-Wu/ccd04c27bf1237368b35eb456b3dd1c18ef9a9b9
https://www.semanticscholar.org/paper/Interpreting-Deep-Learning%3A-The-Machine-Learning-Charles/b7488a0ac799a2c62882a5b40f4ea4b1c88f04c4 https://www.semanticscholar.org/paper/Minimizing-Control-for-Credit-Assignment-with-Meulemans-Farinha/0bb32a1b9a8702a38f54b64ca08df8abffc097a8 https://www.semanticscholar.org/paper/The-Union-of-Manifolds-Hypothesis-and-its-for-Deep-Brown-Caterini/3c0a4afc8f430f32442a8efa306f898d9198d7c5