## Discretize B ## # [B,N] [E->N] [B,E] B = layer.W_B(x[b,l]) # no bias
Shouldn’t this be x[:,l] instead of x[b,l]?
Shouldn’t this be x[:,l] instead of x[b,l]?