[Linkpost] The Expressive Capacity of State Space Models: A Formal Language Perspective

Link post

Paper authors: Yash Sarrof, Yana Veitsman, Michael Hahn.

Context: architectures with weak forward passes can be differentially transparent; see e.g. this comment /​ the whole thread and research agendas like externalized reasoning or the translucent thought hypothesis.

Summary thread: https://​​x.com/​​yashYRS/​​status/​​1795340993757352402. Summary of the summary thread: like transformers, which have weak forward passes, SSMs are also in the TC0 computational complexity class, ‘but cover distinct fragments within it’. ‘SSMs can track hierarchical structures with optimal memory [...] suggesting that SSMs, while being more parallellizable, maintain sufficient power to handle the hierarchical structure of language.’

Abstract:

Recently, recurrent models based on linear state space models (SSMs) have shown promising performance in language modeling (LM), competititve with transformers. However, there is little understanding of the in-principle abilities of such models, which could provide useful guidance to the search for better LM architectures. We present a comprehensive theoretical study of the capacity of such SSMs as it compares to that of transformers and traditional RNNs. We find that SSMs and transformers have overlapping but distinct strengths. In star-free state tracking, SSMs implement straightforward and exact solutions to problems that transformers struggle to represent exactly. They can also model bounded hierarchical structure with optimal memory even without simulating a stack. On the other hand, we identify a design choice in current SSMs that limits their expressive power. We discuss implications for SSM and LM research, and verify results empirically on a recent SSM, Mamba.