Charlie Steiner comments on Transparency and AGI safety

Charlie Steiner 15 Jan 2021 7:36 UTC
5 points
Dropout makes interpretation easier because it disincentivizes complicated features where you can only understand the function of the parts in terms of their high-order correlations with other parts. This is because if a feature relies on such correlations, it will be fragile to some of the pieces being dropped out.

Anti-dropout promotes consolidation of similar features into one, but it also incentivizes that one feature to be maximally complicated and fragile.

Re: first idea. Yeah, something like that. Basically just an attempt at formalization of “functionally similar neurons,” so that when you go to drop out a neuron, you actually drop out all functionally similar ones.