Thanks!
I haven’t grokked your loss scales explanation (the “interpretability insights” section) without reading your other post though.
Not saying anything deep here. The point is just that you might have two cartoon pictures:
every correctly classified input is either the result of a memorizing circuit or of a single coherent generalizing circuit behavior. If you remove a single generalizing circuit, your accuracy will degrade additively.
a correctly classified input is the result of a “combined” circuit consisting of multiple parallel generalizing “subprocesses” giving independent predictions, and if you remove any of these subprocesses, your accuracy will degrade multiplicatively.
A lot of ML work only thinks about picture #1 (which is the natural picture to look at if you only have one generalizing circuit and every other circuit is a memorization). But the thing I’m saying is that picture #2 also occurs, and in some sense is “the info-theoretic default” (though both occur simultaneously—this is also related to the ideas in this post)
Looks like a conspiracy of pigeons posing as lw commenters have downvoted your post