RobertKirk comments on Engineering Monosemanticity in Toy Models

RobertKirk 23 Nov 2022 16:42 UTC
LW: 1 AF: 1
0
AF

You might well expect that features just get ignored below some threshold and monosemantically represented above it, or it could be that you just always get a polysemantic morass in that limit

I guess the recent work on Polysemanticity and Capacity seems to suggest the latter case, especially in sparser settings, given the zone where multiple feature are represented polysemantically, although I can’t remember if they investigate power-law feature frequencies or just uniform frequencies

were a little concerned about going down a rabbit hole given some of the discussion around whether the results replicated, which indicated some sensitivity to optimizer and learning rate.

My impression is that that discussion was more about whether the empirical results (i.e. do ResNets have linear mode connectivity?) held up, rather than whether the methodology used and present in the code base could be used to find whether linear mode connectivity is present between two models (up to permutation) for a given dataset. I imagine you could take the code and easily adapt it to check for LMC between two trained models pretty quickly (it’s something I’m considering trying to do as well, hence the code requests).

I think (at least in our case) it might be simpler to get at this question, and I think the first thing I’d do to understand connectivity is ask “how much regularization do I need to move from one basin to the other?” So for instance suppose we regularized the weights to directly push them from one basin towards the other, how much regularization do we need to make the models actually hop?

That would defiitely be interesting to see. I guess this is kind of presupposing that the models are in different basins (which I also believe but hasn’t yet been verified). I also think looking at basins and connectivity would be more interesting in the case where there was more noise, either from initialisation, inherently in the data, or by using a much lower batch size so that SGD was noisy. In this case it’s less likely that the same configuration results in the same basin, but if your interventions are robust to these kinds of noise then it’s a good sign.

Good question! We haven’t tried that precise experiment, but have tried something quite similar. Specifically, we’ve got some preliminary results from a prune-and-grow strategy (holding sparsity fixed, pruning smallest-magnitude weights, enabling non-sparse weights) that does much better than a fixed sparsity strategy.

I’m not quite sure how to interpret these results in terms of the lottery ticket hypothesis though. What evidence would you find useful to test it?

That’s cool, looking forward to seeing more detail. I think these results don’t seem that related to the LTH (if I understand your explanation correctly), as LTH involves finding sparse subnetworks in dense ones. Possibly it only actually holds in model with many more parameters, I haven’t seen it investigated in models that aren’t overparametrised in a classical sense.

I think if iterative magnitude pruning (IMP) on these problems produced much sparse subnetworks that also maintained the monosemanticity levels, then that would suggest that sparsity doesn’t penalise monosemanticity (or polysemanticity) in this toy model, and also (much more speculatively) that the sparse well-performing subnetworks that IMP finds in other networks possibly also maintain their levels of poly/mono-semanticity. If we also think these networks are favoured towards poly or mono, then that hints at how the overall learning process if favoured towards poly or mono.