I’m slightly confused about the setup. In the following, what spaces is W mapping between?
Linear: y=Wx
At first I expected W : R^{d_model} → R^{d_model}. But then it wouldn’t make sense to impose a sparsity penalty on W.
In other words: what is the shape of the matrix W?
I’m slightly confused about the setup. In the following, what spaces is W mapping between?
At first I expected W : R^{d_model} → R^{d_model}. But then it wouldn’t make sense to impose a sparsity penalty on W.
In other words: what is the shape of the matrix W?