We view it as an interesting open question whether it is possible to develop SSM-like models with greater expressivity for state tracking that also have strong parallelizability and learning dynamics
Surely fundamentally at odds? You can’t spend a while thinking without spending a while thinking.
Of course, the lunch still might be very cheap by only spending a while thinking a fraction of the time or whatever.
Surely fundamentally at odds? You can’t spend a while thinking without spending a while thinking.
Of course, the lunch still might be very cheap by only spending a while thinking a fraction of the time or whatever.