delton137 comments on Emergent modularity and safety

delton137 22 Oct 2021 2:47 UTC
6 points
I’m having trouble understanding the n-cut metric used in Filan’s work.

A more intuitive measure would be the sum of weights contained in edges that go between each subset of vertices divided by the total sum of weights in the graph as a whole. That’s not quite what n-cut measures though, if you look at the equation—it isn’t normalized that way.

It would be nice if there were some figures of examples of modular graphs with different n-cut values to provide an intuitive understanding of what n-cut = 9 means vs n-cut = 5.

Look at the latest paper (the earlier one seems to have some errors) and look at figure 6. If we just focus on the FASHION dataset results, the control networks had n-cut ~ 10.6 and the dropout networks had just slightly lower values, like 10.3 (if I’m understanding this correctly). The L1 regularized networks had slightly lower n-cut (around 10) and L2 regularized had n-cuts going down to 8. (note the results are sort of all over the map, though). It makes sense to me that dropout would lead to less clusterability—because you end up with multiple redundant sub-networks in different places all doing the same thing.

Anyway, my main question is if a decrease in n-cut from ~11 to 8 is significant. What about going from 8.5 to 5 like in figure 8?

It’s odd that L1 improves clusterability relative to L2 in figure 8 but not figure 6. I would intuitively think L1 would improve clusterability since it’s easier for weights to get suppressed to exactly zero

As a side note, I’m glad they changed the title—“Neural networks are surprisingly modular” is a rather unscientific name for a paper—scientific papers should stick to facts, not subjective impressions like how surprised you are. (other authors are guilty of using “surprising” in the title too, see here)

As far as Olah’s work, I think it’s the best I’ve seen for visualizing the internal workings of neural nets, but I’m also a bit worried his methods as they are right now are missing a lot, for instance the non-robust features described in this paper. Also, the change of basis issue, where random directions often give meaningful/interpretable visualizations looks like a serious issue to me (see the section “Interactions between neurons” here). But that’s a whole separate discussion..