leogao comments on leogao’s Shortform

leogao 7 Dec 2024 15:46 UTC
27 points
9
a take I’ve expressed a bunch irl but haven’t written up yet: feature sparsity might be fundamentally the wrong thing for disentangling superposition; circuit sparsity might be more correct to optimize for. in particular, circuit sparsity doesn’t have problems with feature splitting/absorption
- Sodium 7 Dec 2024 18:54 UTC
  2 points
  −2
  Parent
  Yeah my view is that as long as our features/intermediate variables form human understandable circuits, it doesn’t matter how “atomic” they are.