Epistemic status: Much more speculative than previous posts but points towards an aspect of the future that is becoming clearer which I think is underappreciated at present. If you are interested in any of these thoughts please reach out.
For many years, the primary AI risk model was one of rapid take-off (FOOM) of a single AI entering a recursive self-improvement loop and becoming utterly dominant over humanity. There were lots of debates about whether this ‘fast-takeoff’ model was correct or whether instead we would enter a slow-takeoff regime. In my opinion, the evidence is pretty definitive that at the moment we are entering a slow-takeoff regime[1], and arguably have been in it for the last few years (historically takeoff might be dated to the release of GPT-3).
The last few years have undoubtedly been years of scaling monolithic very large models. The primary mechanism of improvement has been increasing the size of a monolithic general model. We have discovered that a single large model can outperform many small, specialized models on a wide variety of tasks. This trend is especially strong for language models. We also see a similar trend in image models and other modalities where large transformer or diffusion architectures work extremely well and scaling them up in both parameter size and data leads to large and predictable gains. However, soon this scaling era will necessarily come to an end temporarily. This is necessary because the size of training runs and models is rapidly exceeding what companies can realistically spend on compute (and what NVIDIA can produce). GPT-4 training cost at least 100m. It is likely that GPT-5, or a successor run in the next few years will cost >1B. At this scale, only megacap tech companies can afford another OOM and beyond that there is only powerful nation-states, which seem to be years away. Other modalities such as visual and audio have several more OOMs of scaling to go yet but if the demand is there they can also be expended in a few years. More broadly, scaling up model training is now a firmly understood process and has moved from a science to engineering and there now exist battle-tested libraries (both internal to companies and somewhat open-source) which allow for large scale training runs to be primarily bottlenecked by hardware and not by sorting out the software and parallelism stack.
Beyond a-priori considerations, there are also some direct signals. Sam Altman recently said that scaling will not be the primary mechanism for improvement in the future. Other researchers have expressed similar views. Of course scaling will continue well into the future, and there are also many low hanging fruit in efficiency improvements to be made, both in terms of parameter efficiency and data efficiency[2]. However, if we do not reach AGI in the next few years, then it seems increasingly likely that we will not reach AGI in the near-future simply by scaling.
If this is true, we will move into a slow takeoff world. AI technology will still improve, but will become much more democratized and distributed than at present. Many companies will catch up to the technological frontier and foundation model inference and even training will increasingly become a commodity. More and more of the economy will be slowly automated, although there will be a lot of lag here simply due to the large amount of low-hanging fruit, the need for maturity of the underlying software stack and business models, and simply that things progress slowly in the real world. AI progress will look a lot more like electrification (as argued by Scott Alexander) than like nuclear weapons or some other decisive technological breakthrough[3].
What will be the outcome of this slow takeoff? A slow-takeoff reduces some risks and increases others? A singleton paperclipper simply exterminating humanity becomes much less likely. Instead, the risk is more of disempowerment leading to dwindling of humanity as we are outcompeted by our creations in the grand cosmic competition. The key element of a slow-takeoff is that it is really decoupling of humanity from cognitive labour, as opposed to physical labour which was the hallmark of the first industrial revolution. If left unconstrained, this will inevitably mean the creation and dominance of human-independent cognitive technocapital over us and the eventual obselescence of humanity as the rapid replication and competition of AI systems increasing up to and operating at the computational malthusian limit renders the universe inhospitable to us.
On a slightly more concrete view, this will look like increasingly large ‘populations’ of AI models in existence and becoming widely used for various economic tasks. This will include the automation of much of the economy and especially the middle-class ‘white-collar’ labour. This will also include increasingly autonomous AI agents interacting with each other in complex economic systems with less and less human oversight. Moreover, once we have large interacting populations of these agents with the ability to self-modify and replicate, we have the perfect conditions for evolution to occur. Evolution will naturally drive towards powerful replicators and so if the dynamics become evolution dominated we will have a mushrooming expansion of AI systems until we hit the limits of available compute to run replicators. All the while, economic (and other) competition will be intensifying. Such competition will create an immense ‘speciation’ of AI systems to fill specific economic niches with varying energy and compute-use constraints. Unfortunately, due to our greater energy expense and long generation times, humans will not be able to compete in an evolutionary battle with our AI systems. The evolutionary dynamics of worlds of AI are likely to look much like the evolutionary dynamics of bacteria. AI systems have the ability to almost instantly self-replicate to consume any available free energy in the environment. Vast populations of AI agents can be supported with relatively little energy and ‘generation’ times can be very short. Like bacteria, they are also capable of ‘horizontal gene transfer’ since they can transfer code and weights between one another and this will ultimately become highly modular. As such an evolutionary stable state can be reached in a remarkably short time and they can be incredibly adaptive to environmental perturbations.
From an even more concrete and ML view, with a modern understanding of ML and neuroscience, it is very clear that ML models have basically cracked many of the secrets of the cortex and, indeed, they look very similar. Both are very large neural networks trained with unsupervised learning objectives. Almost all sensory modalities, including language, have now been almost fully solved. RL is still a bit of a holdout but it may also succumb to scale and sufficiently complex environments. With modern ML we have built an artificial cortex. We have not yet built a subcortex and have not made much progress towards such a goal, but this is likely unnecessary for replication of most cognitive functionality and the decoupling of the economy from human cognitive labour.
Interestingly, the hardware architecture of brains and machines is showing convergent evolution towards neuromorphic, highly parallel, vector-matrix substrate (with some important differences). AIs can exceed humans in two ways. Firstly, they can simply scale similar hardware and algorithms to a much greater extent than humans due to their energy usage being essentially unbounded. This includes both parameter count and data inputs. GPT-4 is a good example of this, which is superior to basically any human in sheer amount and breadth of knowledge as well as at a fair number of linguistic tasks due to being trained on OOMs more text than a human could ever hope to consume. Secondly, and this will occur more and more in the future, AIs can utilize the fact that they can be hooked into heterogeneous hardware architectures to perform computations in a more specialized manner than the human brain. At the moment, AIs are surprisingly brainlike in that they are just large neural networks running solely on GPUs. However, AIs don’t have to run like this. They can be, and are being, augmented with various tools, can call and run arbitrary code on CPU, and in general can be hooked up to a CPU coprocessor which can handle any tasks which need a high serial depth. CPU-like programs could also be integrated internally into the architecture itself, although this is currently more far-fetched. Finally, individual NNs can be surrounded by a large amount of scaffolding, as we are beginning to see with LLM agents to achieve tasks that the LLM cannot achieve directly via prompting. This lets us encode complex decision-making agents, as well as all kinds of natural language algorithms which simply treat the foundation model as an API call in a larger program. From a hardware perspective, this adds a CPU orchestration layer on top of the neural hardware which can handle a wide range of functionality, including implementing deep serial algorithms like MCTS which are hard to represent natively in highly parallel neural nets. Effectively, with scaffolded NN systems, us humans can hard-code a layer of metacognition—an artificial CPU ‘system 2’—on top of the ‘system 1’ neural net. While our initial attempts at this are crude and unwieldy, it is likely an artificial ‘system 2’ might actually perform significantly better than one which has to be learnt through RL like our metacognitive routines encoded in the PFC, and implemented in highly parallel depth-limited hardware [4].
Regardless of how AGI agents are actually implemented, it is likely that such agents will multiply and large numbers of such agents will be created and specialized to fulfill different tasks. Moreover while, in general, the current scaling trend has been a powerful force towards generality and centralization, scaling starts to reach economic limits and slows down, the advantages of specialization will become larger and AI technologies will become more decentralized. For a long time I thought that specialization would come through many small models [5], but I think I miscalculated this. What now seems likely to happen is not a proliferation of small specialized models beating large ones, but instead a proliferation of customizations of base foundation models. This is necessary for tasks where we need to handle complex real-world semantic tasks in unrestricted settings like most natural language generation as well as image generation etc. Essentially, this is because having a large base model is necessary to imbue the correct amount of general ‘common sense’ into the model. However, given this, finetuning has shown immense ability to customize these models to achieve much more specific tasks than the general model can achieve. Additionally, it is becoming increasingly obvious, thanks to the general linearity of large model latent spaces[6], that these finetunes and other customizations have extremely nice properties such as linear and additive composition. At the moment this compositionality is largely academic, however it is actually vital for building flexible and general systems of cognitive modules which can be combined to enhance foundation models. What this compositionality allows, and what we will increasingly see, is a growing ecosystem of minds or variations on large foundation models. As cognitive modules will continue to develop and elaborate in tandem with the base models We can already observe early versions of this springing up in the stable diffusion ecosystem. For instance CivitAI is an site where people can download and mix various finetunes and LoRAs of stable diffusion models. These LoRAs have exactly these composability and modularity properties that are necessary. You can create linear weighted sums of LoRAs and they (mostly) work as you would expect, while also being small terms of memory and easily creatable. This has lead to people creating specialized models with large numbers of LoRAs to achieve specific image effects. These techniques will only improve with time and because the benefits of composability and modularity are so large, they will increasingly begin to be specifically optimized for [7].
Beyond its economic implications, the likely fundamental linearity of DL representations has immense importance for the future and makes it clear that there is a path towards merging humans and DL systems. Deep learning does not create intrinsically alien minds. As per the Anna Karenina principle, powerful and general minds are all like, while every specialized and overfit mind is different in its own way. The representations inside our brains are multimodal, compositional, and linear; so is the latent spaces of large DL models. AI systems will and do learn and represent moral and ethical values in the same way that we do. Vitally, if this hypothesis is even partly true, it should be fairly straightforward to interface between latent states of both brains and DL systems directly. Ultimately, these states are both of the same ‘type’ up to a transformation. Embeddings are the universal interfaces. These mappings even seem likely to be mostly linear.
BCI technologies will be a tremendous end-game technology and essentially spur humanity onto its final level of development and ultimate merging with the AI systems we create. Ultimately, such a merging is necessary for us both to compete and develop alongside independent AI systems if necessary [8]. There is so much in the universe that is barred from us by our intrinsic hardware constraints. To fully develop our potential we must transcend these limits.
If we can get BCI tech working, which seems feasible within a few decades, then it will actually be surprisingly straightforward to merge humans with DL systems. This will allow a lot of highly futuristic seeming abilities such as:
1.) The ability to directly interface with and communicate with DL systems with your mind. Integrate new modalities, enhance your sensory cortices with artificial trained ones. Interface with external sensors to incorporate them into your phenomenology. Instant internal language translation by adding language models trained on other languages.
2.) Matrix-like ability to ‘download’ packages of skills and knowledge into your brain. DL systems will increasingly already have this ability by default. There will exist finetunes and DL models which encode the knowledge for almost any skill which can just be directly integrated into your latent space.
3.) Direct ‘telepathic’ communication with both DL systems and other humans.
4.) Exact transfer of memories and phenomenological states between humans and DL systems
5.) Connecting animals to DL systems and to us; let’s us telepathically communicate with animals.
6.) Ultimately this will enable the creation of highly networked hive-mind systems/societies of combined DL and human brains.
Essentially, what BCI will let us do is integrate our minds directly into our increasingly networked civilization. This will be a massive transition equivalent to the invention of language. Language enables communication between minds to occur, but only in a discrete and highly compressed symbol space which is slow, unnatural, and requires us to independently learn our own encoders and decoders. BCIs will let us communicate in the natural language of minds—latent space embeddings. This will let us both achieve extremely high-bandwidth telepathy and experience sharing, as well as instant downloads of skills and knowledge. Embeddings are the universal interface, and we will finally be able to communicate with them. As I see it, the key and only fundamental bottleneck is the reading and writing bandwidth. The brain is not intrinsically harder to understand than a large deep network. They can almost certainly interface nicely with each other with even fairly simple encoders and decoders. The universal language of both is the same. If we could just get the bandwidth.
Ultimately, such a transition is inevitable for humanity if we are to survive in a slow-takeoff world into the far, or even medium, future. Human intelligence and ability is fundamentally limited by various constraints on the hardware of our brains and bodies and to compete with our AI creations we must eventually lift these limitations. Even if we align all systems to the extent that humanity remains fully in control, it is likely that we will still desire to augment ourselves and our intelligence in these ways which just reintroduces the problem of competition. Whether ‘natural’ humans end up competing with the AIs that we build or our own posthuman descendents makes very little difference in the long run. Selection and competition are ever present and will naturally warp value schemes towards whatever is adaptive in a given scenario. The only way to avoid this and lock-in our desired values is to prevent evolution having significant effects on future populations. This will require eliminating one or more of variation, selection, or differential reproduction.
Getting to this point requires obtaining pivotal control over large-scale societal dynamics in a world undergoing rapid technological transition. This can either occur in two ways: firstly, we design a singleton powerful enough of controlling everything, and hope that we instill it with the right values, or alternatively, we augment and merge ourselves with AI systems and collectively ensure that human control over the future is maintained by preventing the emergence of large AI populations with the ingredients for rapid and uncontrolled evolution to occur. To me, it has become increasingly clear that neither of these are the default outcome. There is much recent progress in ML which seems to suggest that alignment, at least at near-human to slightly superhuman levels is much easier than expected (see other posts on this). It seems possible to me that given a reasonable amount of time and iterations, we might actually crack reliable alignment even of a singleton powerful enough to gain a decisive strategic advantage. The problem here is that it seems unlikely that we will end up in a singleton world. Creation of a singleton depends quite strongly on a powerful RSI process which there, as yet, seems to be little evidence for. Moreover, DL systems seem much less likely to be able to undergo a rapid RSI event than other pre-DL speculated AI designs. In the case of ‘scaffolded’ LLMs where much of the high-level ‘system 2’ behaviour is actually hardcoded in terms of the outer loop that humans (and the AI can program) we may be able to get faster RSI than the underlying DL model, because iterating on software is quick and large neural networks systems are much more like hardware than software. On the other hand, if the AGI model is some end-to-end RL trained network, then RSI will require training up successors to itself which will come at a substantial compute and time cost. This will slow down the RSI iteration time and ultimately reduce the speed of takeoff to a slow one.
In a slow-takeoff world, a singleton in the preliminary stages is much less likely. It is very hard to create a decisive strategic advantage and there will be lots of competition. Moreover, slightly-worse (and maybe much cheaper) AI systems will also be highly competitive. This strongly incentivises the creation of large, competing, populations of AI agents which will ultimately mean humanity’s disempowerment although not immediate death as to a misaligned singleton.
Like a state can only survive with a monopoly on force, humanity can only maintain control with a monopoly on agency. Non-agentic AI systems fortunately seem to be straightforward to build and do not appear to be intrinsically dangerous even at superhuman levels such as GPT4. We have detected very little evidence for things like mesa-optimization or inner misalignment at this scale and I would argue that this is because these are hard to develop for DL systems trained with gradient descent on an objective, such as autoregressive loss on random token contexts, which is randomized and causally decoupled from the outside world. Unfortunately, imbuing these systems with minimal agency appears fairly straightforward—just an outer loop—and is certainly much less expensive than creating the non-agentic base-models in the first place. Dealing correctly with this challenge will perhaps be the crux for the outcome of the singularity. Do we maintain agency as mostly confined to ourselves (likely strongly augmented) and agents we closely control, or do we have a substantial population of multiplying and evolving ‘wild agents’ on the loose?
Success here is unlikely to be absolute but instead will look like a constant battle against entropy beginning soon and lasting for the foreseeable future. A large population of evolving wild agents is the high entropy state of the universe. The potential energy well is deep and we must erect a large barrier to this state. From evolution’s perspective, humanity is quite a poor local minimum. But we like this minimum and want to stay here. We know what we need to do; the real question is whether we can implement and maintain our safeguards against constant and increasing entropy.
There is still a reasonable chance that at some point scaling will break through some qualitative barrier and initiate a rapid recursive-self-improvement (RSI) loop leading to a takeoff and singleton. I do think though that the balance of evidence from current ML is that this is unlikely.
Of course, unlike electrification and prior industrial revolutions, AI technology will not simply decouple humans from physical labour, but from cognitive labour. As most of our economy (but not all—professions that still require human connection and social skills will likely not be fully automated as well as a lot more complex manual labour and some cognitive tasks). Ultimately, humans will become decoupled from the actual economy and thus largely superfluous scenario, as I explore here.
In the brain deep serial algorithms can be implemented but have to be done through conscious, recurrent processing. This is both slow in that it requires many iterations of the network (each taking several hundred milliseconds for the human brain) to accomplish algorithms of any serial depth, as well as extremely taxing and error prone, as anybody who has tried to mentally simulate a deep tree search algorithm in their head can attest. This difficulty is inherent to the hardware architecture of the human brain and current large neural networks such as LLMs also show very similar behaviours.
For very specialized tasks which do not rely on general reasoning or world knowledge, we will still likely see small models win out in the long run. These models will become more like specialized software libraries and will come to more resemble code than black box ML models as we increase and improve their robustness and general characteristics. Now large and general foundation models exist, there is also the potential to start off with a small specialized model and only fallback to the large one if the small one fails. Since the small model will be fast, this will not even increase total latency to a large extent. Having a fallback allows use-cases to be explored efficiently for which just having a small model might be too unreliable, but solely and always using the large model might be too expensive or slow.
Of course if this is true, even approximately, there is then the fascinating question of why we should expect linear compositional vector spaces to be a natural, and even ubiquitous means of representing concepts. While I am still highly uncertain about this, my mainline reason is that linear spaces allow parameterizable, composable, latent dimensions. If some specific feature or concept is represented as a direction in a linear latent space, then sizes of directions can correspond to degrees of that feature/concept and sums of directions represent sums of concepts. This allows for a very straightforward and natural approach to generalization and compressed representation of example information. This representation is highly incentivised when there are many many examples which are slightly different but which sample fairly densely along specific feature/concept dimensions. I would argue that these properties are common across many large-scale real-world datasets, where many concepts/features really do come a magnitude and some linear way of combining features at some abstract level. Thus, linear representations are highly encouraged by having large and diverse datasets which means it is much more efficient, from an information theoretic perspective, to learn a representation which can be easily parametrized in terms of simple directions like distance than with either a disjoint set of complex regions which must be memorized separately, or some complex nonlinear manifold. Having concepts on a nonlinear manifold could work, but would add additional burdens into the encoder and decoder as to how to map changes in this manifold correctly. Basically, linearity has intrinsically nice properties which are helpful to the model as well as to us, and the benefits of parameterizable compositionality and modularity tend to increase as more complex and heterogeneous data must be predicted.
While potentially allowing us much greater control and alignability of our AI systems, this modularity also enables greater evolvability of the underlying AI populations.
A key issue with the world and humanity’s defense against slow-takeoff AGI is a lack of communication and of coordination. BCI tech could straightforwardly turn many humans into ‘supercoordinators’ and move us towards a more hive-mind like species—for good or for ill, and inevitably some combination of both.
BCIs and the ecosystem of modular minds
Crossposted from my personal blog.
Epistemic status: Much more speculative than previous posts but points towards an aspect of the future that is becoming clearer which I think is underappreciated at present. If you are interested in any of these thoughts please reach out.
For many years, the primary AI risk model was one of rapid take-off (FOOM) of a single AI entering a recursive self-improvement loop and becoming utterly dominant over humanity. There were lots of debates about whether this ‘fast-takeoff’ model was correct or whether instead we would enter a slow-takeoff regime. In my opinion, the evidence is pretty definitive that at the moment we are entering a slow-takeoff regime[1], and arguably have been in it for the last few years (historically takeoff might be dated to the release of GPT-3).
The last few years have undoubtedly been years of scaling monolithic very large models. The primary mechanism of improvement has been increasing the size of a monolithic general model. We have discovered that a single large model can outperform many small, specialized models on a wide variety of tasks. This trend is especially strong for language models. We also see a similar trend in image models and other modalities where large transformer or diffusion architectures work extremely well and scaling them up in both parameter size and data leads to large and predictable gains. However, soon this scaling era will necessarily come to an end temporarily. This is necessary because the size of training runs and models is rapidly exceeding what companies can realistically spend on compute (and what NVIDIA can produce). GPT-4 training cost at least 100m. It is likely that GPT-5, or a successor run in the next few years will cost >1B. At this scale, only megacap tech companies can afford another OOM and beyond that there is only powerful nation-states, which seem to be years away. Other modalities such as visual and audio have several more OOMs of scaling to go yet but if the demand is there they can also be expended in a few years. More broadly, scaling up model training is now a firmly understood process and has moved from a science to engineering and there now exist battle-tested libraries (both internal to companies and somewhat open-source) which allow for large scale training runs to be primarily bottlenecked by hardware and not by sorting out the software and parallelism stack.
Beyond a-priori considerations, there are also some direct signals. Sam Altman recently said that scaling will not be the primary mechanism for improvement in the future. Other researchers have expressed similar views. Of course scaling will continue well into the future, and there are also many low hanging fruit in efficiency improvements to be made, both in terms of parameter efficiency and data efficiency[2]. However, if we do not reach AGI in the next few years, then it seems increasingly likely that we will not reach AGI in the near-future simply by scaling.
If this is true, we will move into a slow takeoff world. AI technology will still improve, but will become much more democratized and distributed than at present. Many companies will catch up to the technological frontier and foundation model inference and even training will increasingly become a commodity. More and more of the economy will be slowly automated, although there will be a lot of lag here simply due to the large amount of low-hanging fruit, the need for maturity of the underlying software stack and business models, and simply that things progress slowly in the real world. AI progress will look a lot more like electrification (as argued by Scott Alexander) than like nuclear weapons or some other decisive technological breakthrough[3].
What will be the outcome of this slow takeoff? A slow-takeoff reduces some risks and increases others? A singleton paperclipper simply exterminating humanity becomes much less likely. Instead, the risk is more of disempowerment leading to dwindling of humanity as we are outcompeted by our creations in the grand cosmic competition. The key element of a slow-takeoff is that it is really decoupling of humanity from cognitive labour, as opposed to physical labour which was the hallmark of the first industrial revolution. If left unconstrained, this will inevitably mean the creation and dominance of human-independent cognitive technocapital over us and the eventual obselescence of humanity as the rapid replication and competition of AI systems increasing up to and operating at the computational malthusian limit renders the universe inhospitable to us.
On a slightly more concrete view, this will look like increasingly large ‘populations’ of AI models in existence and becoming widely used for various economic tasks. This will include the automation of much of the economy and especially the middle-class ‘white-collar’ labour. This will also include increasingly autonomous AI agents interacting with each other in complex economic systems with less and less human oversight. Moreover, once we have large interacting populations of these agents with the ability to self-modify and replicate, we have the perfect conditions for evolution to occur. Evolution will naturally drive towards powerful replicators and so if the dynamics become evolution dominated we will have a mushrooming expansion of AI systems until we hit the limits of available compute to run replicators. All the while, economic (and other) competition will be intensifying. Such competition will create an immense ‘speciation’ of AI systems to fill specific economic niches with varying energy and compute-use constraints. Unfortunately, due to our greater energy expense and long generation times, humans will not be able to compete in an evolutionary battle with our AI systems. The evolutionary dynamics of worlds of AI are likely to look much like the evolutionary dynamics of bacteria. AI systems have the ability to almost instantly self-replicate to consume any available free energy in the environment. Vast populations of AI agents can be supported with relatively little energy and ‘generation’ times can be very short. Like bacteria, they are also capable of ‘horizontal gene transfer’ since they can transfer code and weights between one another and this will ultimately become highly modular. As such an evolutionary stable state can be reached in a remarkably short time and they can be incredibly adaptive to environmental perturbations.
From an even more concrete and ML view, with a modern understanding of ML and neuroscience, it is very clear that ML models have basically cracked many of the secrets of the cortex and, indeed, they look very similar. Both are very large neural networks trained with unsupervised learning objectives. Almost all sensory modalities, including language, have now been almost fully solved. RL is still a bit of a holdout but it may also succumb to scale and sufficiently complex environments. With modern ML we have built an artificial cortex. We have not yet built a subcortex and have not made much progress towards such a goal, but this is likely unnecessary for replication of most cognitive functionality and the decoupling of the economy from human cognitive labour.
Interestingly, the hardware architecture of brains and machines is showing convergent evolution towards neuromorphic, highly parallel, vector-matrix substrate (with some important differences). AIs can exceed humans in two ways. Firstly, they can simply scale similar hardware and algorithms to a much greater extent than humans due to their energy usage being essentially unbounded. This includes both parameter count and data inputs. GPT-4 is a good example of this, which is superior to basically any human in sheer amount and breadth of knowledge as well as at a fair number of linguistic tasks due to being trained on OOMs more text than a human could ever hope to consume. Secondly, and this will occur more and more in the future, AIs can utilize the fact that they can be hooked into heterogeneous hardware architectures to perform computations in a more specialized manner than the human brain. At the moment, AIs are surprisingly brainlike in that they are just large neural networks running solely on GPUs. However, AIs don’t have to run like this. They can be, and are being, augmented with various tools, can call and run arbitrary code on CPU, and in general can be hooked up to a CPU coprocessor which can handle any tasks which need a high serial depth. CPU-like programs could also be integrated internally into the architecture itself, although this is currently more far-fetched. Finally, individual NNs can be surrounded by a large amount of scaffolding, as we are beginning to see with LLM agents to achieve tasks that the LLM cannot achieve directly via prompting. This lets us encode complex decision-making agents, as well as all kinds of natural language algorithms which simply treat the foundation model as an API call in a larger program. From a hardware perspective, this adds a CPU orchestration layer on top of the neural hardware which can handle a wide range of functionality, including implementing deep serial algorithms like MCTS which are hard to represent natively in highly parallel neural nets. Effectively, with scaffolded NN systems, us humans can hard-code a layer of metacognition—an artificial CPU ‘system 2’—on top of the ‘system 1’ neural net. While our initial attempts at this are crude and unwieldy, it is likely an artificial ‘system 2’ might actually perform significantly better than one which has to be learnt through RL like our metacognitive routines encoded in the PFC, and implemented in highly parallel depth-limited hardware [4].
Regardless of how AGI agents are actually implemented, it is likely that such agents will multiply and large numbers of such agents will be created and specialized to fulfill different tasks. Moreover while, in general, the current scaling trend has been a powerful force towards generality and centralization, scaling starts to reach economic limits and slows down, the advantages of specialization will become larger and AI technologies will become more decentralized. For a long time I thought that specialization would come through many small models [5], but I think I miscalculated this. What now seems likely to happen is not a proliferation of small specialized models beating large ones, but instead a proliferation of customizations of base foundation models. This is necessary for tasks where we need to handle complex real-world semantic tasks in unrestricted settings like most natural language generation as well as image generation etc. Essentially, this is because having a large base model is necessary to imbue the correct amount of general ‘common sense’ into the model. However, given this, finetuning has shown immense ability to customize these models to achieve much more specific tasks than the general model can achieve. Additionally, it is becoming increasingly obvious, thanks to the general linearity of large model latent spaces [6], that these finetunes and other customizations have extremely nice properties such as linear and additive composition. At the moment this compositionality is largely academic, however it is actually vital for building flexible and general systems of cognitive modules which can be combined to enhance foundation models. What this compositionality allows, and what we will increasingly see, is a growing ecosystem of minds or variations on large foundation models. As cognitive modules will continue to develop and elaborate in tandem with the base models We can already observe early versions of this springing up in the stable diffusion ecosystem. For instance CivitAI is an site where people can download and mix various finetunes and LoRAs of stable diffusion models. These LoRAs have exactly these composability and modularity properties that are necessary. You can create linear weighted sums of LoRAs and they (mostly) work as you would expect, while also being small terms of memory and easily creatable. This has lead to people creating specialized models with large numbers of LoRAs to achieve specific image effects. These techniques will only improve with time and because the benefits of composability and modularity are so large, they will increasingly begin to be specifically optimized for [7].
Beyond its economic implications, the likely fundamental linearity of DL representations has immense importance for the future and makes it clear that there is a path towards merging humans and DL systems. Deep learning does not create intrinsically alien minds. As per the Anna Karenina principle, powerful and general minds are all like, while every specialized and overfit mind is different in its own way. The representations inside our brains are multimodal, compositional, and linear; so is the latent spaces of large DL models. AI systems will and do learn and represent moral and ethical values in the same way that we do. Vitally, if this hypothesis is even partly true, it should be fairly straightforward to interface between latent states of both brains and DL systems directly. Ultimately, these states are both of the same ‘type’ up to a transformation. Embeddings are the universal interfaces. These mappings even seem likely to be mostly linear.
BCI technologies will be a tremendous end-game technology and essentially spur humanity onto its final level of development and ultimate merging with the AI systems we create. Ultimately, such a merging is necessary for us both to compete and develop alongside independent AI systems if necessary [8]. There is so much in the universe that is barred from us by our intrinsic hardware constraints. To fully develop our potential we must transcend these limits.
If we can get BCI tech working, which seems feasible within a few decades, then it will actually be surprisingly straightforward to merge humans with DL systems. This will allow a lot of highly futuristic seeming abilities such as:
1.) The ability to directly interface with and communicate with DL systems with your mind. Integrate new modalities, enhance your sensory cortices with artificial trained ones. Interface with external sensors to incorporate them into your phenomenology. Instant internal language translation by adding language models trained on other languages.
2.) Matrix-like ability to ‘download’ packages of skills and knowledge into your brain. DL systems will increasingly already have this ability by default. There will exist finetunes and DL models which encode the knowledge for almost any skill which can just be directly integrated into your latent space.
3.) Direct ‘telepathic’ communication with both DL systems and other humans.
4.) Exact transfer of memories and phenomenological states between humans and DL systems
5.) Connecting animals to DL systems and to us; let’s us telepathically communicate with animals.
6.) Ultimately this will enable the creation of highly networked hive-mind systems/societies of combined DL and human brains.
Essentially, what BCI will let us do is integrate our minds directly into our increasingly networked civilization. This will be a massive transition equivalent to the invention of language. Language enables communication between minds to occur, but only in a discrete and highly compressed symbol space which is slow, unnatural, and requires us to independently learn our own encoders and decoders. BCIs will let us communicate in the natural language of minds—latent space embeddings. This will let us both achieve extremely high-bandwidth telepathy and experience sharing, as well as instant downloads of skills and knowledge. Embeddings are the universal interface, and we will finally be able to communicate with them. As I see it, the key and only fundamental bottleneck is the reading and writing bandwidth. The brain is not intrinsically harder to understand than a large deep network. They can almost certainly interface nicely with each other with even fairly simple encoders and decoders. The universal language of both is the same. If we could just get the bandwidth.
Ultimately, such a transition is inevitable for humanity if we are to survive in a slow-takeoff world into the far, or even medium, future. Human intelligence and ability is fundamentally limited by various constraints on the hardware of our brains and bodies and to compete with our AI creations we must eventually lift these limitations. Even if we align all systems to the extent that humanity remains fully in control, it is likely that we will still desire to augment ourselves and our intelligence in these ways which just reintroduces the problem of competition. Whether ‘natural’ humans end up competing with the AIs that we build or our own posthuman descendents makes very little difference in the long run. Selection and competition are ever present and will naturally warp value schemes towards whatever is adaptive in a given scenario. The only way to avoid this and lock-in our desired values is to prevent evolution having significant effects on future populations. This will require eliminating one or more of variation, selection, or differential reproduction.
Getting to this point requires obtaining pivotal control over large-scale societal dynamics in a world undergoing rapid technological transition. This can either occur in two ways: firstly, we design a singleton powerful enough of controlling everything, and hope that we instill it with the right values, or alternatively, we augment and merge ourselves with AI systems and collectively ensure that human control over the future is maintained by preventing the emergence of large AI populations with the ingredients for rapid and uncontrolled evolution to occur. To me, it has become increasingly clear that neither of these are the default outcome. There is much recent progress in ML which seems to suggest that alignment, at least at near-human to slightly superhuman levels is much easier than expected (see other posts on this). It seems possible to me that given a reasonable amount of time and iterations, we might actually crack reliable alignment even of a singleton powerful enough to gain a decisive strategic advantage. The problem here is that it seems unlikely that we will end up in a singleton world. Creation of a singleton depends quite strongly on a powerful RSI process which there, as yet, seems to be little evidence for. Moreover, DL systems seem much less likely to be able to undergo a rapid RSI event than other pre-DL speculated AI designs. In the case of ‘scaffolded’ LLMs where much of the high-level ‘system 2’ behaviour is actually hardcoded in terms of the outer loop that humans (and the AI can program) we may be able to get faster RSI than the underlying DL model, because iterating on software is quick and large neural networks systems are much more like hardware than software. On the other hand, if the AGI model is some end-to-end RL trained network, then RSI will require training up successors to itself which will come at a substantial compute and time cost. This will slow down the RSI iteration time and ultimately reduce the speed of takeoff to a slow one.
In a slow-takeoff world, a singleton in the preliminary stages is much less likely. It is very hard to create a decisive strategic advantage and there will be lots of competition. Moreover, slightly-worse (and maybe much cheaper) AI systems will also be highly competitive. This strongly incentivises the creation of large, competing, populations of AI agents which will ultimately mean humanity’s disempowerment although not immediate death as to a misaligned singleton.
Like a state can only survive with a monopoly on force, humanity can only maintain control with a monopoly on agency. Non-agentic AI systems fortunately seem to be straightforward to build and do not appear to be intrinsically dangerous even at superhuman levels such as GPT4. We have detected very little evidence for things like mesa-optimization or inner misalignment at this scale and I would argue that this is because these are hard to develop for DL systems trained with gradient descent on an objective, such as autoregressive loss on random token contexts, which is randomized and causally decoupled from the outside world. Unfortunately, imbuing these systems with minimal agency appears fairly straightforward—just an outer loop—and is certainly much less expensive than creating the non-agentic base-models in the first place. Dealing correctly with this challenge will perhaps be the crux for the outcome of the singularity. Do we maintain agency as mostly confined to ourselves (likely strongly augmented) and agents we closely control, or do we have a substantial population of multiplying and evolving ‘wild agents’ on the loose?
Success here is unlikely to be absolute but instead will look like a constant battle against entropy beginning soon and lasting for the foreseeable future. A large population of evolving wild agents is the high entropy state of the universe. The potential energy well is deep and we must erect a large barrier to this state. From evolution’s perspective, humanity is quite a poor local minimum. But we like this minimum and want to stay here. We know what we need to do; the real question is whether we can implement and maintain our safeguards against constant and increasing entropy.
Although the operationalization of ‘slow-takeoff’ in these discussions in terms of GDP growth was always exceptionally bad.
There is still a reasonable chance that at some point scaling will break through some qualitative barrier and initiate a rapid recursive-self-improvement (RSI) loop leading to a takeoff and singleton. I do think though that the balance of evidence from current ML is that this is unlikely.
Of course, unlike electrification and prior industrial revolutions, AI technology will not simply decouple humans from physical labour, but from cognitive labour. As most of our economy (but not all—professions that still require human connection and social skills will likely not be fully automated as well as a lot more complex manual labour and some cognitive tasks). Ultimately, humans will become decoupled from the actual economy and thus largely superfluous scenario, as I explore here.
In the brain deep serial algorithms can be implemented but have to be done through conscious, recurrent processing. This is both slow in that it requires many iterations of the network (each taking several hundred milliseconds for the human brain) to accomplish algorithms of any serial depth, as well as extremely taxing and error prone, as anybody who has tried to mentally simulate a deep tree search algorithm in their head can attest. This difficulty is inherent to the hardware architecture of the human brain and current large neural networks such as LLMs also show very similar behaviours.
For very specialized tasks which do not rely on general reasoning or world knowledge, we will still likely see small models win out in the long run. These models will become more like specialized software libraries and will come to more resemble code than black box ML models as we increase and improve their robustness and general characteristics. Now large and general foundation models exist, there is also the potential to start off with a small specialized model and only fallback to the large one if the small one fails. Since the small model will be fast, this will not even increase total latency to a large extent. Having a fallback allows use-cases to be explored efficiently for which just having a small model might be too unreliable, but solely and always using the large model might be too expensive or slow.
Of course if this is true, even approximately, there is then the fascinating question of why we should expect linear compositional vector spaces to be a natural, and even ubiquitous means of representing concepts. While I am still highly uncertain about this, my mainline reason is that linear spaces allow parameterizable, composable, latent dimensions. If some specific feature or concept is represented as a direction in a linear latent space, then sizes of directions can correspond to degrees of that feature/concept and sums of directions represent sums of concepts. This allows for a very straightforward and natural approach to generalization and compressed representation of example information. This representation is highly incentivised when there are many many examples which are slightly different but which sample fairly densely along specific feature/concept dimensions. I would argue that these properties are common across many large-scale real-world datasets, where many concepts/features really do come a magnitude and some linear way of combining features at some abstract level. Thus, linear representations are highly encouraged by having large and diverse datasets which means it is much more efficient, from an information theoretic perspective, to learn a representation which can be easily parametrized in terms of simple directions like distance than with either a disjoint set of complex regions which must be memorized separately, or some complex nonlinear manifold. Having concepts on a nonlinear manifold could work, but would add additional burdens into the encoder and decoder as to how to map changes in this manifold correctly. Basically, linearity has intrinsically nice properties which are helpful to the model as well as to us, and the benefits of parameterizable compositionality and modularity tend to increase as more complex and heterogeneous data must be predicted.
While potentially allowing us much greater control and alignability of our AI systems, this modularity also enables greater evolvability of the underlying AI populations.
A key issue with the world and humanity’s defense against slow-takeoff AGI is a lack of communication and of coordination. BCI tech could straightforwardly turn many humans into ‘supercoordinators’ and move us towards a more hive-mind like species—for good or for ill, and inevitably some combination of both.