Chakshu Mira comments on Ophiology (or, how the Mamba architecture works)

Chakshu Mira 2 May 2024 22:05 UTC
1 point
0
## Discretize B ## # [B,N] [E->N] [B,E] B = layer.W_B(x[b,l]) # no bias
Shouldn’t this be x[:,l] instead of x[b,l]?