lewis smith comments on The ‘strong’ feature hypothesis could be wrong

lewis smith 5 Aug 2024 10:00 UTC
3 points
0
I’m not entirely sure I follow here; I am thinking of compositionally as a feature of the format of a representation (Chris Olah has a good note on this here https://transformer-circuits.pub/2023/superposition-composition/index.html). I think whether we should expect one kind of representation or another is an interesting question, but ultimately an empirical one: there are some theoretical arguments for linear representations (basically that they should be easy for NNs to make decisions based on them) but the biggest reason to believe in them is just that people genuinely have found lots of examples of linear mediators that seem quite robust (e.g Golden Gate claude, neels stuff on refusal directions)
- Charlie Steiner 6 Aug 2024 7:27 UTC
  3 points
  0
  Parent
  Yeah, I was probably equivocating confusingly between compositionality as a feature of the representation, and compositionality as a feature of the manifold that the data / activation distribution lives near.
  If you imagine the manifold, then compositionality is the ability to have a coordinate system / decomposition where you can do some operation like averaging / recombination with two points on the manifold, and you’ll get a new point on the manifold. (I guess this making sense relies on the data / activation distribution not filling up the entire available space.)