Huh, I actually tried this. Training IA3, which multiplies activations by a float. Then using that float as the importance of that activation. It seems like a natural way to use backprop to learn an importance matrix, but it gave small (1-2%) increases in accuracy. Strange.
I also tried using a VAE, and introducing sparsity by tokenizing the latent space. And this seems to work. At least probes can overfit to complex concept using the learned tokens.
Huh, I actually tried this. Training IA3, which multiplies activations by a float. Then using that float as the importance of that activation. It seems like a natural way to use backprop to learn an importance matrix, but it gave small (1-2%) increases in accuracy. Strange.
I also tried using a VAE, and introducing sparsity by tokenizing the latent space. And this seems to work. At least probes can overfit to complex concept using the learned tokens.