I did not use your initialization scheme, since I was unaware of your paper at the time I was running those experiments. I will definitely try that soon!
Yeah, I can see how leaky topk and multi-topk are doing similar things. I wonder if leaky topk also gives a progressive code past the value of k used in training. That definitely seems worth looking into. Thanks for the suggestions!
I did not use your initialization scheme, since I was unaware of your paper at the time I was running those experiments. I will definitely try that soon!
Yeah, I can see how leaky topk and multi-topk are doing similar things. I wonder if leaky topk also gives a progressive code past the value of k used in training. That definitely seems worth looking into. Thanks for the suggestions!