I thought I had some informal evidence that permuting the latents was good and after double checking some evidence I don’t feel confident that it is good.
Training without permutation seems to attain slightly better FVU/L0, has reasonable looking features at a quick glance, seems to solve the toy model at comparable rates to permuted, and is simpler to code.
I thought I had some informal evidence that permuting the latents was good and after double checking some evidence I don’t feel confident that it is good.
Training without permutation seems to attain slightly better FVU/L0, has reasonable looking features at a quick glance, seems to solve the toy model at comparable rates to permuted, and is simpler to code.