The convergent dynamic we missed
An excerpt from a longer post that I kept refining over the last 5 months.
By far the greatest danger of Artificial Intelligence is that people conclude too early that they understand it. Of course this problem is not limited to the field of AI. Jacques Monod wrote: “A curious aspect of the theory of evolution is that everybody thinks he understands it” — Yudkowsky, 2008 |
The convergence argument most commonly discussed is instrumental convergence: where machinery channels their optimisation through represented intermediate outcomes in order to be more likely to achieve any aimed-for outcomes later. [1]
Instrumental convergence results from internal optimisation:
– code components being optimised for (an expanding set of) explicit goals.
Instrumental convergence has a hidden complement: substrate-needs convergence.
Substrate-needs convergence results from external selection:
– all components being selected for (an expanding set of) implicit needs.
This will sound abstract. Bear with me. Let’s take this from different angles:
AGI would be made up of a population of connected/nested components. This population changes as eg. hardware is modified and produced, or code is learned from inputs and copied onto the hardware.
AGI, as defined here, also has a general capacity to maintain own components.
Any physical component has a limited lifespan. Configurations erode in chaotic ways.
To realistically maintain components[2], AGI also must produce the replacement parts.
AGI’s components are thus already interacting to bring about all the outside conditions and contexts needed to produce their own parts. Imagine all the subtle parallel conditions needed at mines, chemical plants, fab labs and assembly plants to produce hardware. All that would be handled by the machinery components of AGI.
So there is a changing population of components. And those connected components function in interactions to create the ambient conditions and contexts needed to reproduce parts of themselves. And as new components get connected into that population, the functionality of those interacting components shifts as well.
This is where substrate-needs convergence comes in. When changing connected components have their shifting functionality[3] expressed as effects across/to surrounding production infrastructure, their functionality converges around bringing about more of the conditions and contexts needed for more of those components to exist and function.
Any changing population of AGI components gets selected over time toward propagating those specific environmental effects that fulfill their needs.
Whatever learned or produced components that across all their physical interactions with connected contexts happen to direct outside effects that feed back into their own maintenance and replication as assembled configurations…do just that.[4]
Here is the catch: AGI components interacting to maintain and replicate themselves are artificial. Their physical substrate is distinct from our organic human substrate.
Their distinct physical substrate has distinct molecular and electric properties – requiring different conditions and contexts to assemble and maintain the assembly.
Here is an example:
Silicon dioxide needs to be heated above 1400 ºC to free outer electrons, and allow an ingot to melt. While production of silicon chips needs extremely high temperatures, computation runs best at extremely low temperatures (to reduce the electrical resistance over conductor wires).
Carbon bonds in our body, however, would oxidise (ie. burn) at such temperatures. And cooling water in our bodies below 0 ºC makes the H₂O molecules freeze and expand into ice crystals. That would destroy our cells—we would die.
We humans need around room temperature at every point of our lifecycle – to sustain the continuous organic chemical reactions through which our body operates and maintains itself.
Hardware works differently. Hardware configurations do not operate – nor are maintained – by being in semi-chaotic chemical reactions.
Hardware is made from some chemically inert substrate – for example SO₂ – that mostly does not react under ambient temperatures and pressures found on Earth’s surface.
Something like a rock – which ordinarily stays hard in form and needs magma-level temperatures and pressures to be reshaped.
This property of being chemically inert while operating allows hardware components to be standardised. By molecules not splitting off nor moving about nor rebonding like molecules in human bodies do, the configurations stay stable and compartmentalised.
In turn, standardisation of hardware allows hardware components produced in different places and times to still store, compute or transmit a piece of code in the same way (ie. consistently). Standardisation supports virtualisation.
Standardised hardware of “general AI” would be robust over, and need, a much wider range of temperatures and pressures than our comparatively fragile human wetware can handle.
Temperature and pressure can be measured and controlled for. That’s misleading.
Innumerable other conditions and subtler contexts would be needed by, and get selected for in, AGI. These fall outside the limits of what the AGI’s actual built-in detection and correction methods could control for.[5]
We humans too depend on highly specific environmental conditions and contexts for the components nested inside our bodies (proteins→organelles→cells→cell lining→) to continue in their complex functioning, such to be maintaining of our overall existence.
Between the highly specific set of artificial needs and highly specific set of organic needs, there is mostly non-overlap. AGI cannot control most of the components’ iterative effects from converging on their artificial needs, so they do. Their fulfilled artificial needs are disjunctive of our organic needs for survival. So the humans die.
Under runaway feedback, our planetary environment is modified in the directions needed for continued and greater AGI existence. Outside the ranges we can survive.
To summarise a longer post:
Fundamental limits:
Control methods cannot constrain most environmental effects propagated by interacting AGI components. Any built-in method to detect and correct effects – to align external effects with internal reference values – is insufficient.Uncontrollable feedback:
A subset of the effects will feed back into further maintaining or replicating (higher-level) configurations of hardware that propagated those effects. No internal control feedback loops could correct the possible external feedback loops.Substrate-needs convergence:
These environmental effects are needed for components to come into and stay in existence. But their environmental needs are different from our needs. Their artificial needs are in conflict with our organic needs for survival. Ie. toxic.
AGI would necessarily converge on causing the extinction of all humans.
- ^
As an example:
AGI’s planning converges on producing more compute hardware in order for AGI to more accurately simulate paths to future outcomes. - ^
Realistically in the sense of not having to beat entropy or travel back in time.
- ^
Note how ‘shifting functionality’ implies that original functionality can be repurposed by having a functional component connect in a new way.
Existing functionality can be co-opted.
If narrow AI gets developed into AGI, AGI components will replicate in more and more non-trivial ways. Unlike when carbon-based lifeforms started replicating ~3.7 billion years ago, for AGI there would already exist repurposable functions at higher abstraction layers of virtualised code – pre-assembled in the data scraped from human lifeforms with own causal history.
Here is an incomplete analogy for how AGI functionality gets co-opted:
Co-option by a mind-hijacking parasite:
A rat ingests toxoplasma cells, which then migrate to the rat’s brain. The parasites’ DNA code is expressed as proteins that cause changes to regions of connected neurons (eg. amygdala). These microscopic effects cascade into the rat – while navigating physical spaces – no longer feeling fear when it smells cat pee. Rather, the rat finds the smell appealing and approaches the cat’s pee. Then cat eats the rat and toxoplasma infects its next host over its reproductive cycle.So a tiny piece of code shifts a rat’s navigational functions such that the code variant replicates again. Yet rats are much more generally capable than a collection of tiny parasitic cells – surely the ‘higher intelligent being’ would track down and stamp out the tiny invaders?
A human is in turn more generally capable than a rat, yet toxoplasma make their way into 30% of the human population. Unbeknownst to cat ‘owners’ infected by toxoplasma gondii, human motivations and motor control get influenced too. Infected humans more frequently end up in accidents, lose social relationships, and so forth.
Parasites present real-life examples of tiny pieces of evolutionarily selected-for code spreading and taking over existing functions of vastly more generally capable entities. See also COVID co-opting our lungs’ function to cough.
But there is one crucial flaw in this analogy:
Variants that co-opt initial AGI functions are not necessarily parasites. They can symbiotically enable other variants across the hosting population to replicate as well. In not threatening the survival nor reproduction of AGI components, they would not be in an adversarial relationship with their host.Rather, the humans constraining the reproductive fitness of AGI to gain benefits are, evolutionary speaking, the parasites. The error-corrective system we would build in lowers the host’s reproductive fitness. It is like a faulty immune system that kills healthy gut bacteria. It will get selected out.
As humans, we rely on our evolved immune system to detect and correct out viruses, including for the vaccinations we develop and deploy. We also rely on the blood-testes and blood-follicle barrier to block variants of these viruses from entering into our body’s (re)production facilities.
Smaller viruses survive this detection more frequently, so code strands of replicating virus variants are selected for staying small. Few viruses replicate ‘as a package’ with the DNA code of their host. Thus, virus variants rarely co-adapt and integrate with the host’s variants, each selected for raising collective reproductive fitness (except for eg. detection signatures).
These systems and barriers add to our reproductive fitness: our ability to preserve and replicate internal code. Past DNA code that got expressed – in interaction with surrounding code – as (precursors of) these functions got selected for as a portion of the existing human ‘code pool’.
For any organic system or barrier preventing virus variants from replicating through our bodies, evolution is firmly on our side. For any artificial system or barrier we build to prevent misaligned AI variants from replicating through solid-state hardware infrastructure, evolution will thwart our efforts.
AI variants would compete for hardware/resources like computer viruses do. And co-adapt and integrate with other internal variants to replicate as part of larger symbiotic packages.
- ^
Now, a thought may come to your mind:
“That sounds like evolutionary selection; but isn’t evolution a slow and local optimiser?”Yes, this does involve evolutionary selection.
Unfortunately, by Forrest’s estimation, the selection that would take place through components of self-sufficient learning machinery would take < 500 years to cause ecosystem-wide extinction. This compared to the 3.7 billion years from the origin of carbon lifeforms to us humans starting to cause a mass extinction.Reasons include:
• Pre-assembled functions:
First solid-state lifeforms can co-opt/repurpose pre-assembled AI functions and infrastructure (vs. first carbon-based lifeforms that started from scratch).
• Standardization:
The efficiency gains of the virtualisation of code’s storage, computation and transmission – vastly reducing how much atoms need to be moved about and molecularly reconfigured. Think of how fast memes spread through society – even while still requiring lots of atoms to jiggle across neurons in our brains.
• Faster reproduction:
Reproduce hardware components in days to months, versus humans who take decades to reproduce as physical units.
• The terraforming gap:
A much larger gap between the current state of planet Earth and the conditions that self-sufficient self-assembling learning machinery need and would therefore modify the environment toward (versus gap to conditions needed by humans and other species living in carbon-based ecosystem).
~ ~ ~
Another argument you may have heard is that the top-down intelligent engineering by goal-directed AGI would beat the bottom-up selection happening through this intelligent machinery.That argument can be traced back to Eliezer Yudkowsky’s sequence The Simple Math of Evolution. Unfortunately, there were mistakes in Eliezer’s posts, some of which a modern evolutionary biologist may have been able to correct:
• implying that sound comparisons can be made between the organisms’ reproductive fitness, as somehow independent of changes in environmental context, including unforeseeable changes (eg. a Black Swan event of a once-in-200 years drought that kills the entire population, except a few members who by previous derivable standards would have been relatively low fitness).
• overlooking the ways that information can be stored within the fuzzy regions of phenotypic effects maintained outside respective organisms.
• overlooking the role of transmission speed-up for virtualisation of code.
• overlooking the tight coupling in AGI between the internal learning/selection of code, and external selection of that code through differentiated rates of component replication through the environment.
• overlooking the role of co-option (or more broadly, exaptation) of existing code, by taking a perspective that evolution runs by selecting ‘from scratch’ for new point-wise mutations. - ^
Worse, since error correction methods would correct out component variants with detectable unsafe/co-optive effects, this leaves to grow in influence any replicating branches of variants with undetectable unsafe/co-optive effects.
Thus, the error correction methods select for the variants that can escape detection. As do meta-methods (having to soundly and comprehensively adapt error correction methods to newly learned code or newly produced hardware parts).
I still do not agree with your position, but thanks to this post I think I at least understand it better than I did before. I think my core disagreements are:
That needn’t be the case. If all of the other arguments in this post were to hold, any AI or AI-coalition (whether aligned to us or not) which has taken over the world could simply notice “oh no, if I keep going I’ll be overtaken by the effects described in Remmelt’s post!” and then decide to copy itself onto biological computing or nanobots or whatever else strange options it can think of. An aligned AI would even moreso move towards such a substrate if you’re correct that otherwise humans would die, because it wants to avoid this.
The more general piece of solutionspace I want to point to, here, is “if you think there’s a way for eight billion uncoordinated human minds running on messy human brains inside of industrial civilization to survive, why couldn’t aligned superintelligent AI just at the very least {implement/reuse} a copy of what human civilization is doing, and get robustness that way? (though I expect that it could come up with much better).
I’m pretty sure I already believed this before reading any Yudkowsky, so I’ll make my own argument here.
Intelligent engineering can already observed to work much faster than selection effects. It also seems straightforward to me that explicit planning to maximize a particular utility function would be expected to steer the world towards what it wants a lot faster than selection effects would. I could maybe expand on this point if you disagree, but I’d be really surprised by that.
And intelligence itself can be very robust to selection effects. Homomorphic-encryption and checkums, and things like {a large population of copies of itself repeatedly checking each other’s entire software state and deactivating (via eg killswitch) any instance that has been corrupted}, are examples of technologies an AI can use to make its software robust to hardware change, in a way that’d take selection effects exponential time to be able to get even just one bit of corruption to stay into the system, such that it is not difficult for the superintelligent AI to ensure that approximately zero copies of itself ever get robustly corrupted until the heat death of the universe.
Would it? Even once it has nanobots and biotech and any other strange weird tech it can use to maintain whichever parts of itself (if any) match those descriptions?
Finally, as a last recourse if the rest of your post is true, an aligned AI which has taken over the world can simply upload humans so they don’t die when the physical conditions become too bad. We can run on the same compute as its software does, immune to corruption from hardware in the same way.
As an alternative, an aligned superintelligent AI could only planets (or other celestial bodies) which we don’t live on to run the bulk of its infrastructure, ensuring “from a distance” (through still very reliable tech that can be made to not get in the way of human life) that planets with humans on them don’t launch an AI which would put the aligned superintelligent AI at risk.
Finally, note that these arguments are mostly disjunctive. Even just one way for aligned superintelligent AI to get around this whole argument you’re making, would be sufficient to make it wrong. My thoughts above are not particularly my predictions for how an aligned superintelligent AI would actually do, but moreso “existence arguments” for how ways to get around this exist at all — I expect that an aligned superintelligence can come up with much better solutions than I can.
If there truly is no way at all for an aligned superintelligence to exist without humans dying, then (as I’ve mentioned before), it can just notice that and shut itself down, after spending much-less-than-500-years rearranging the world into one that is headed towards a much better direction (through eg widespread documentation of the issues with building AI and widespread training in rationality).
Thanks for your thoughts
If artificial general intelligence moves to a completely non-artificial substrate at many nested levels of configuration (meaning in this case, a substrate configured like use from the proteins to the cells), then it would not be artificial anymore.
I am talking about wetware like us, not something made out of standardised components. So these new wetware-based configurations definitely would also not have the general capacities you might think they would have. It’s definitely not a copy of the AGI’s configurations.
If they are standardised in their configuration (like hardware), the substrate-needs convergence argument above definitely still applies.
The argument is about how general artificial intelligence, as defined, would converge if they continue to exist. I can see how that was not clear from the excerpt, because I did not move over this sentence:
”This is about the introduction of self-sufficient learning machinery, and of all modified versions thereof over time, into the world we humans live in.”
I get what you are coming from. Next to the speed of the design, maybe look at the *comprehensiveness* of the ‘design’.
Something you could consider spending more time thinking about is how natural selection works through the span of all physical interactions between (parts of) the organism and their connected surroundings. And top-down design does not.
For example, Eliezer brought up before how top-down design of an ‘eye’ wouldn’t have the retina sit back behind all that fleshy stuff that distorts light. A camera was designed much faster by humans. However, does a camera self-heal when it breaks like our eye does? Does a camera clean itself? And so on – to much fine-grained functional features of the eye.
Yesterday, Anders Sandberg had a deep productive conversation about this with my mentor.
What is missing in your description is that the unidimensionality and simple direct causality of low-level error correction methods (eg. correcting bit flips) cannot be extrapolated to higher-level and more ambiguous abstractions (eg. correcting for viruses running over software, correcting for neural network hallucinations, correcting for interactive effects across automated machine production infrastructure).
Yes, because of the inequalities I explained in the longer post you read. I’ll leave it to the reader to do their own thinking to understand why.
This is assuming the conclusion.
If we could actually have an aligned AGI (let’s make the distinction), the evolutionary feedback effects cannot be sufficiently controlled for to stay aligned with internal reference values. The longer post explains why.
Those “emulated humans” based on lossy scans of human brains, etc, wouldn’t be human anymore.
You need to understand the fine-grained biological complexity involved.
If you keep repeating the word ‘aligned’, it does not make it so. Saying it also does not make it less infeasible.
How about we have a few careful human thinkers presently living, like Anders, actually spend the time to understand the arguments?
How about we not wager all life on Earth on the hope that “AGI” being developed on the basis of corporate competition and other selective forces would necessarily be orienting around understanding the arguments and then acting in a coherently aligned enough way to shut themselves down?
I know this sounds just like an intellectual debate, but you’re playing with fire.