Posted on Twitter:
Opus can operate as a Turing machine.
given only existing tapes, it learns the rules and computes new sequences correctly.
100% accurate over 500+ 24-step solutions (more tests running).
for 100% at 24 steps, the input tapes weigh 30k tokens*.
GPT-4 cannot do this.
Here is the prompt code for the Turing machine: https://github.com/SpellcraftAI/turing
This is the fully general counterpoint to the @VictorTaelin’s A::B challenge (he put money where his mouth is and got praise for that from Yudkowsky).
Attention is Turing Complete was a claim already in 2021:
Theorem 6 The class of Transformer networks with positional encodings is Turing complete. Moreover, Turing completeness holds even in the restricted setting in which the only non-constant values in positional embedding pos(n) of n, for n ∈ N, are n, 1/n, and 1/n2 , and Transformer networks have a single encoder layer and three decoder layer
Congratulations to Anthropic for getting an LLM to act as a Turing machine—though that particular achievement shouldn’t be surprising. Of greater practical interest is, how efficiently can it act as a Turing machine, and how efficiently should we want it to act. After all, it’s far more efficient to implement your Turing machine as a few lines of specialized code.
On the other hand, the ability to be a (universal) Turing machine could, in principle, be the foundation of the ability to reliably perform complex rigorous calculation and cognition—the kind of tasks where there is an exact right answer, or exact constraints on what is a valid next step, and so the ability to pattern-match plausibly is not enough. And that is what people always say is missing from LLMs.
I also note the claim that “given only existing tapes, it learns the rules and computes new sequences correctly”. Arguably this ability is even more important than the ability to follow rules exactly, since this ability is about discovering unknown exact rules, i.e., the LLM inventing new exact models and theories. But there are bounds on the ability to extrapolate sequences correctly (e.g. complexity bounds), so it would be interesting to know how closely Claude approaches those bounds.
This is completely not about performance. Humans are not good at that either. It is the ability to learn fully general simulation. It is not exactly going full circle back to teaching computers math and logic, but close. It is more a spiral to one level higher; that the LLMs can understand these.