I do not know who is right about Mamba here and it doesn’t seem easy to check?
Nora is definitely right in this case (Mamba isn’t doing dramatically more serial operations than Transformers). If it was, it would be slower running in not-batch-mode, whereas the whole point of Mamba is it’s faster (due to sub-quadratic attention).
Nora is definitely right in this case (Mamba isn’t doing dramatically more serial operations than Transformers). If it was, it would be slower running in not-batch-mode, whereas the whole point of Mamba is it’s faster (due to sub-quadratic attention).