I agree that reducing superposition is probably valuable even if it requires a significantly larger network. I still don’t understand why the transition from float to binary would cause a dramatic reduction in superposition capacity. But if it does prevent superposition, great! I’ll just give it more parameters as needed. But if we still get superposition, I will need to apply other techniques to make it stop.
(I have not yet finished my closer re-read of Toy Models of Superposition after my initial skimming. Perhaps once I do I will understand better.)
Hopefully in a few months I will have empirical data regarding how much more neurons we need. Then I can stop hand waving about vague intuitions.
If we can get the unwanted cognition/behaviors to sit entirely in their own section of weights, we can then ablate the unwanted behaviors without losing wanted capability. That’s my hope anyway.
I agree that reducing superposition is probably valuable even if it requires a significantly larger network. I still don’t understand why the transition from float to binary would cause a dramatic reduction in superposition capacity. But if it does prevent superposition, great! I’ll just give it more parameters as needed. But if we still get superposition, I will need to apply other techniques to make it stop.
(I have not yet finished my closer re-read of Toy Models of Superposition after my initial skimming. Perhaps once I do I will understand better.)
Hopefully in a few months I will have empirical data regarding how much more neurons we need. Then I can stop hand waving about vague intuitions.
My thoughts and hope as well.