Do you happen to know how this compares with https://github.com/BlinkDL/RWKV-LM which is described as an RNN with performance comparable to a transformer / linear attention?
I don’t know, but I’d love to know! If you find out, please tell me!
Do you happen to know how this compares with https://github.com/BlinkDL/RWKV-LM which is described as an RNN with performance comparable to a transformer / linear attention?
I don’t know, but I’d love to know! If you find out, please tell me!