Epistemic Status

Highlighting a thesis in Janus’ “Simulators” that I think is insufficiently appreciated.

Thesis

In the limit, models optimised for minimising predictive loss on humanity’s text corpus converge towards general intelligence^[1].

Preamble

From Janus’ Simulators:

Something which can predict everything all the time is more formidable than any demonstrator it predicts: the upper bound of what can be learned from a dataset is not the most capable trajectory, but the conditional structure of the universe implicated by their sum (though it may not be trivial to extract that knowledge).

Introduction

I affectionately refer to the above quote as the “simulators thesis”. Reading and internalising that passage was an “aha!” moment for me. I was already aware (at latest July 2020) that language models were modelling reality. I was persuaded by arguments of the below form:

Premise 1: Modelling is transitive. If X models Y and Y models Z, then X models Z.
Premise 2: Language models reality. “Dogs are mammals” occurs more frequently in text than “dogs are reptiles” because dogs are in actuality mammals and not reptiles. This statistical regularity in text corresponds to a feature of the real world. Language is thus a map (albeit flawed) of the external world.
Premise 3: GPT-3 models language. This is how it works to predict text.
Conclusion: GPT-3 models the external world.

But I hadn’t yet fully internalised all the implications of what it means to model language and hence our underlying reality. The limit that optimisation for minimising predictive loss on humanity’s text corpus will converge to. I belatedly make those updates.

Interlude: The Requisite Capabilities for Language Modelling

Janus again:

If loss keeps going down on the test set, in the limit – putting aside whether the current paradigm can approach it – the model must be learning to interpret and predict all patterns represented in language, including common-sense reasoning, goal-directed optimization, and deployment of the sum of recorded human knowledge.
Its outputs would behave as intelligent entities in their own right. You could converse with it by alternately generating and adding your responses to its prompt, and it would pass the Turing test. In fact, you could condition it to generate interactive and autonomous versions of any real or fictional person who has been recorded in the training corpus or even could be recorded (in the sense that the record counterfactually “could be” in the test set).

Implications

The limit of predicting text is predicting the underlying processes that generated said text. If said underlying processes are agents, then sufficiently capable language models can predict agent (e.g., human) behaviour to arbitrary fidelity^[2]. If it turns out to be the case that the most efficient way of predicting the behaviour of conscious entities (as discriminated via text records) is to instantiate conscious simulacra, then such models may perpetuate mindcrime.

Furthermore, the underlying processes that generate text aren’t just humans, but the world which we inhabit. That is, a significant fraction of humanity’s text corpus reports on empirical features of our external environment or the underlying structure of reality:

Timestamps
- And other empirical measurements
Log files
Database files
- Including CSVs and similar
Experiment records
Research findings
Academic journals in quantitative fields
Other reports
Etc.

Moreover, such text is often clearly distinguished from other kinds of text (fiction, opinion pieces, etc.) via its structure, formatting, titles, etc. In the limit of minimising predictive loss on such text, language models must learn the underlying processes that generated them — the conditional structure of the universe.

The totality of humanity’s recorded knowledge about the world — our shared world model — is a lower bound on what language models can learn in the limit^[3]. We would expect that sufficiently powerful language models would be able to synthesise said shared world model and make important novel inferences about our world that are implicit in humanity’s recorded knowledge, but which have not yet been explicitly synthesised by anyone^[4].

The idea that the capabilities of language models are bounded by the median human contributor to their text corpus or even the most capable human contributor is completely laughable. In the limit, language models are capable of learning the universe^[5].

Text prediction can scale to superintelligence^[6].

This is a very nontrivial claim. Sufficiently hard optimisation for performance on most cognitive tasks (e.g. playing Go) will not converge towards selecting for generally intelligent systems (let alone strongly superhuman general intelligences). Text prediction is quite special in this regard.

This specialness suggests that text prediction is not an inherently safe optimisation target; future language models (or simulators more generally) may be dangerously capable^[7].

Caveats

Humanity’s language corpus embeds the majority of humanity’s accumulated explicit knowledge about our underlying reality. There does exist knowledge possessed by humans that hasn’t been represented in text anywhere. It is probably the case that the majority of humanity’s tacit knowledge hasn’t been explicitly codified anywhere, and even among the knowledge that has been recorded in some form, a substantial fraction may be hard to access or not be organised/structured in formats suitable for consumption by language models.

I suspect that most useful (purely) cognitive work that humans do is communicated via language to other humans and thus is accessible for learning via text prediction. Most of our accumulated cultural knowledge and our shared world model(s), do seem to be represented in text. However, it’s not necessarily the case that pure text prediction is sufficient to learn arbitrary capabilities of human civilisation.

Moreover, the diversity and comprehensiveness of the dataset a language model is trained on will limit the capabilities it can actually attain in deployment. Likewise, the limitations imposed by the architecture of whatever model we are training. In other words, that a particular upper bound exists in principle, does not mean it will be realised in practice.

Furthermore, the limit of text prediction does not necessarily imply learning the conditional structure of our particular universe, but rather a (minimal?) conditional structure that is compatible with our language corpus. That is, humanity’s language corpus may not uniquely constrain our universe (but a set of universes of which ours is a member). The aspects of humanity’s knowledge about our external world that are not represented in text may be crucial missing information to uniquely single out our universe (or even just humanity’s shared model of our universe). Similarly, it may not be possible — even in principle — to learn features of our universe that humanity is completely ignorant of^[8].

For similar reasons, it may turn out to be the case that it is possible to predict text generated by conscious agents to arbitrarily high fidelity without instantiating conscious simulacra. That is, humans may have subjective experiences and behaviour that cannot be fully captured/discriminated within language. Any aspects of the human experience/condition that are not represented (at least implicitly by reasonable inductive biases) are underdetermined in the limit of text prediction.

Conclusions

Ultimately, while I grant the aforementioned caveats some weight, and those arguments did update me significantly downwards on the likelihood of mindcrime in sufficiently powerful language models^[9], I still fundamentally expect text prediction to scale to superintelligence in the limit.

I think humanity’s language corpus is a sufficiently comprehensive record of humanity’s accumulated explicit knowledge and sufficiently rich representation of our shared world model, that arbitrarily high accuracy in predicting text necessarily requires strongly superhuman general intelligence.

^
Particularly strongly superhuman general intelligence. Henceforth “superintelligence”.
^
At least to degrees of fidelity that can be distinguished via text.
^
More specifically, the world model implicit in our recorded knowledge.
^
A ring theorist was able to coax ChatGPT to develop new nontrivial, logically sound mathematical concepts and generate examples of them. Extrapolating further, I would expect that sufficiently powerful language models will be able to infer many significant novel theoretical insights that could be in principle located given the totality of humanity’s recorded knowledge.
^
That is, they can learn an efficient map of our universe and successfully navigate said map to make useful predictions about it. Sufficiently capable language models should be capable of e.g. predicting research write ups, academic reports and similar.
^
At least in principle, leaving aside whether current architectures will scale that far. Sufficiently strong optimisation on the task of text prediction is in principle capable of creating vastly superhuman generally intelligent systems.
^
That is, sufficiently powerful language models are capable enough to a degree that they could — under particular circumstances — be existentially dangerous. I do not mean to imply that they are independently (by their very nature) existentially dangerous.
^
That is, features of our universe that are not captured, not even implicitly, not even by interpolation/extrapolation in our recorded knowledge.
^
This point may not matter that much as future simulators will probably be multimodal. It seems much more likely that the limit of multimodal prediction of conscious agents, may necessitate instantiating conscious simulacra.
But this post was specifically about the limit of large language models, and I do think the aspects of human experience not represented in text are a real limitation to the suggestion that in the limit language models might instantiate conscious simulacra.