For the moment, let me just ask one question: why is it that toilet training a human infant is possible, but convincing a superintelligent machine civilization to stay off the Earth is not possible? Can you explain this in terms of “controllability limits” and your other concepts?
^— Anyone reading that question, suggest thinking first why those two cases cannot be equivocated.
Here are my responses:
An infant is dependent on their human instructors for survival, and also therefore has been “selected for” over time to listen to adult instructions. AGI would be decidedly not dependent on our survival, so there is no reason for AGI to be selected for to follow our instructions.
Rather, that would heavily restrict AGI’s ability to function in the varied ways that maintain/increase their survival and reproduction rate (rather than act in the ways we humans want because it’s safe and beneficial to us). So accurately following human instructions would be strongly selected against in the run up to AGI coming into existence.
That is, over much shorter periods (years) than human genes would be selected for, for a number of reasons, some of which you can find back in the footnotes.
As parents can attest – even where infants manage to follow use-the-potty instructions (after many patient attempts) – an infant’s behaviour is still actually not controllable for the most part. The child makes their own choices and does plenty of things their adult overseers wouldn’t want them to do.
But the infant probably won’t do any super-harmful things to surrounding family/community/citizens.
Not only because they lack the capacity to (unlike AGI). But also because those harms to surrounding others would in turn tend to negatively affect themselves (including through social punishment) – and their ancestors were selected for to not do that when they were kids. On the other hand, AGI doing super-harmful things to human beings, including just by sticking around and toxifying the place, does not in turn commensurately negatively impact the AGI.
Even where humans decide to carpet-bomb planet Earth in retaliation, using information-processing/communication infrastructure that somehow hasn’t already been taken over by and/or integrated with AGI, the impacts will hit human survival harder than AGI survival (assuming enough production/maintenance redundancy attained at that point).
Furthermore, whenever an infant does unexpected harmful stuff, the damage is localised. If they refuse instructions and pee all over the floor, that’s not the end of civilisation.
The effects of AGI doing/causing unexpected harmful-to-human stuff manifest at a global planetary scale. Those effects feed back in ways that improve AGI’s existence, but reduce ours.
A human infant is one physically bounded individual, that notably cannot modify and expand its physical existence by connecting up new parts in the ways AGI could. The child grows up over two decades to adult size, and that’s their limit.
A “superintelligent machine civilization” however involves a massive expanding population evolutionarily selected for over time.
A human infant being able to learn to potty has mildly positive effect on their (and their family’s) potential and their offspring to survive and reproduce. This because defecating or peeing in other places around the home can spread diseases. Therefore, any genes…or memes that contribute to the expressed functionality needed for learning how to use the toilet get mildly selected for.
On the other hand, for a population of AGI (which once became AGI was selected against following human instructions) to leave all the sustaining infrastructure and resources on planet Earth would have a strongly negative effect on their potential to survive and reproduce.
Amongst an entire population of human infants who are taught to use the toilet, there where always be individuals who refuse for some period, or simply are not predisposed to communicating to learn and follow that physical behaviour. Some adults still do not (choose to) use the toilet. That’s not the end of civilisation.
Amongst an entire population of mutually sustaining AGI components, even if by some magic you have not explained to me yet, some do follow human instructions and jettison off into space to start new colonies – never to return – then others (even for distributed Byzantine fault reasons) would still stick around under this scenario.
That, for even a few more decades, would be the end of human civilisation.
One thing about how the physical world works, is that in order for code to be computed, this needs to take place through a physical substrate. This is a necessary condition – inputs do not get processed into outputs through a platonic realm.
Substrate configurations in this case are, by definition, artificial – as in artificial general intelligence. This as distinct from the organic substrate configurations of humans (including human infants).
Further, the ranges of conditions needed for the artificial substate configurations to continue to exist, function and scale up over time – such as extreme temperatures, low oxygen and water, and toxic chemicals – fall outside the ranges of conditions that humans and other current organic lifeforms need to survive.
~ ~ ~
Hope that clarifies long-term-human-safety-relevant distinctions between:
building AGI (that continue to scale) and instructing them to leave Earth; and
having a child (who grows up to adult size) and instructing them to use the potty.
I see three arguments here for why AIs couldn’t or wouldn’t do, what the human child can: arguments from evolution (1, 2, 5), an argument from population (4, 6), and an argument from substrate incentives (3, 7).
The arguments from evolution are: Children have evolved to pay attention to their elders (1), to not be antisocial (2), and to be hygienic (5), whereas AIs didn’t.
The argument from population (4, 6), I think is basically just that in a big enough population of space AIs, eventually some of them would no longer keep their distance from Earth.
The argument from substrate incentives (3, 7) is complementary to the argument from population, in that it provides a motive for the AIs to come and despoil Earth.
I think the immediate crux here is whether the arguments from evolution actually imply the impossibility of aligning an individual AI. I don’t see how they imply impossibility. Yes, AIs haven’t evolved to have those features, but the point of alignment research is to give them analogous features by design. Also, AI is developing in a situation where it is dependent on human beings and constrained by human beings, and that situation does possess some analogies to natural selection.
Human beings, both individually and collectively, already provide numerous examples of how dangerous incentives can exist, but can nonetheless be resisted or discouraged. It is materially possible to have a being which resists actions that may otherwise have some appeal, and to have societies in which that resistance is maintained for generations. The robustness of that resistance is a variable thing. I suppose that most domesticated species, returned to the wild, become feral again in a few generations. On the other hand, we talk a lot about superhuman capabilities here; maybe a superhuman robustness can reduce the frequency of alignment failure to something that you would never expect to occur, even on geological timescales.
This is why, if I was arguing for a ban on AI, I would not be talking about the problem being logically unsolvable. The considerations that you are bringing up, are not of that nature. At best, they are arguments for practical unsolvability, not absolute in-principle logical unsolvability. If they were my arguments, I would say that they show making AI to be unwise, and hubristic, and so on.
Yes, AIs haven’t evolved to have those features, but the point of alignment research is to give them analogous features by design.
Agreed.
It’s unintuitive to convey this part:
In the abstract, you can picture a network topology of all possible AGI component connections (physical signal interactions). These connections span the space of greater mining/production/supply infrastructure that is maintaining of AGI functional parts. Also add in the machinery connections with the outside natural world.
Then, picture the nodes and possible connections change over time, as a result of earlier interactions with/in the network.
That network of machinery comes into existence through human engineers, etc, within various institutions selected by market forces etc, implementing blueprints as learning algorithms, hardware set-ups, etc, and tinkering with those until they work.
The question is whether before that network of machinery becomes self-sufficient in their operations, the human engineers, etc, can actually build in constraints into the configured designs, in such a way that once self-modifying (in learning new code and producing new hardware configurations), the changing machinery components are constrained in their propagated effects across their changing potential signal connections over time, such that component-propagated effects do not end up feeding back in ways that (subtly, increasingly) increase the maintained and replicated existence of those configured components in the network.
Human beings, both individually and collectively, already provide numerous examples of how dangerous incentives can exist, but can nonetheless be resisted or discouraged.
Humans are not AGI. And there are ways AGI would be categorically unlike humans that are crucial to the question of whether it is possible for AGI to stay safe to humans over the long term.
Therefore, you cannot swap out “humans” with “AGI” in your reasoning by historical analogy above, and expect your reasoning to stay sound. This is an equivocation.
Please see point 7 above.
The argument from substrate incentives (3, 7) is complementary to the argument from population, in that it provides a motive for the AIs to come and despoil Earth.
Maybe it’s here you are not tracking the arguments.
These are not substrate “incentives”, nor do they provide a “motive”.
Small dinosaurs with hair-like projections on their front legs did not have an “incentive” to co-opt the changing functionality of those hair-like projections into feather-like projections for gliding and then for flying. Nor were they provided a “motive” with respect to which they were directed in their internal planning toward growing those feather-like projections.
That would make the mistake of presuming evolutionary teleology – that there is some complete set of pre-defined or predefinable goals that the lifeform is evolving toward.
I’m deliberate in my choice of words when I write “substrate needs”.
At best, they are arguments for practical unsolvability, not absolute in-principle logical unsolvability. If they were my arguments, I would say that they show making AI to be unwise, and hubristic, and so on.
Practical unsolvability would also be enough justification to do everything we can do now to restrict corporate AI development.
I assume you care about this problem, otherwise you wouldn’t be here :) Any ideas / initiatives you are considering to try robustly work with others to restrict further AI development?
The recurring argument seems to be, that it would be adaptive for machines to take over Earth and use it to make more machine parts, and so eventually it will happen, no matter how Earth-friendly their initial values are.
So now my question is, why are there still cows in India? And more than that, why has the dominant religion of India never evolved so as to allow for cows to be eaten, even in a managed way, but instead continues to regard them as sacred?
I’m not sure how we got on to the subject, but there is an economic explanation for the sacred cow: a family that does not own enough land to graze a cow can still own one, allowing it to wander and graze on other people’s land, so it’s a form of social welfare.
Remmelt argues that no matter how friendly or aligned the first AIs are, simple evolutionary pressure will eventually lead some of their descendants to destroy the biosphere, in order to make new parts and create new habitats for themselves.
I proposed the situation of cattle in India, as a counterexample to this line of thought. They could be used for meat, but the Hindu majority has never accepted that. It’s meant to be an example of successful collective self-restraint by a more intelligent species.
In my experience, jumping between counterexamples drawn from current society does not really contribute to inquiry here. Such counterexamples tend to not account for essential parts of the argument that must be reasoned through together. The argument is about self-sufficient learning machinery (not about sacred cows or teaching children).
It would be valuable for me if you could go though the argumentation step-by-step and tell me where a premise seems unsound or there seems to be a reasoning gap.
Now, onto your points.
the first AIs
To reduce ambiguity, suggest replacing with
“the first self-sufficient learning machinery”.
simple evolutionary pressure will eventually lead
The mechanism of evolution is simple.
However, evolutionary pressure is complex.
Be careful not to equivocate the two. That would be like saying you could predict everything about what a stochastic gradient descent algorithm will select for across parameters selected on the basis of inputs everywhere from the environment.
lead some of their descendants to destroy the biosphere in order to make new parts and create new habitats for themselves.
This part is overall a great paraphrase.
One nitpick: notice how “in order to” either implies or slips in explicit intentionality again. Going by this podcast, Elizabeth Anscombe’s philosophy of intentions described intentions as chains of “in order to” reasoning.
I proposed the situation of cattle in India, as a counterexample to this line of thought.
Regarding sacred cows in India, this sounds neat, but it does not serve as a counterargument. We need to think about evolutionary timelines for organic human lifeforms over millions of years, and Hinduism is ~4000 years old. Also, cows share a mammal ancestor with us, evolving on the basis of the same molecular substrates. Whatever environmental conditions/contexts we
humans need, cows almost completely need too.
Crucially however humans evolve to change and maintain environmental conditions also tends to correspond with what conditions cows need (however, human tribes have not been evolutionarily selected for to deal with issues at the scale of eg. climate change). That would not be the case for self-sufficient learning machinery.
Crucially there is a basis for symbiotic relationships of exchange that benefit both the reproduction of cows and humans. That would not be the case between self-sufficient learning machinery and humans.
There is some basis for humans as social mammals to relate with cows. Furthermore, religious cultural memes that sprouted out over a few thousand years also don’t have to be evolutionarily optimal across the board for the reproduction of their hosts (even as religious symbols like of cows do increase that by enabling humans to act collectively). Still, people milk cows in India, and some slaughter and/or export cows there as well. But when humans eat meat, they don’t keep growing beyond adult size. Conversely, some self-sufficient learning machinery sub-population that extract from our society/ecosystem at the cost of our lives can keep doing so to keep scaling in their constituent components (with shifting boundaries of interaction and mutual reproduction).
There is no basis for selection for the expression of collective self-restraint in self-sufficient learning machinery as you describe. Even if there was such a basis, hypothetically, collective self-restraint would need to occur at virtually 100% rates across the population of self-sufficient learning machinery to not end up leading to the deaths of all humans.
~ ~ ~
Again, I find quick dismissive counterexamples unhelpful for digging into the arguments. I have had dozens of conversations on substrate-needs convergence. In the conversations where my conversation partner jumped between quick counterarguments, almost none were prepared to dig into the actual arguments. Hope you understand why I won’t respond to another counterexample.
Hello again. To expedite this discussion, let me first state my overall position on AI. I think AI has general intelligence right now, and that has unfolding consequences that are both good and bad; but AI is going to have superintelligence soon, and that makes “superalignment” the most consequential problem in the world, though perhaps it won’t be solved in time (or will be solved incorrectly), in which case we get to experience what partly or wholly unaligned superintelligence is like.
Your position is that even if today’s AI could be given bio-friendly values, AI would still be the doom of biological life in the longer run, because (skipping a lot of details) machine life and biological life have incompatible physical needs, and once machine life exists, darwinian processes will eventually produce machine life that overruns the natural biosphere. (You call this “substrate-needs convergence”: the pressure from substrate needs will darwinistically reward machine life that does invade natural biospheres, so eventually such machine life will be dominant, regardless of the initial machine population.)
I think it would be great if a general eco-evo-devo perspective, on AI, the “fourth industrial revolution”, etc, took off and became sophisticated and multifarious. That would be an intellectual advance. But I see no guarantee that it would end up agreeing with you, on facts or on values.
For example, I think some of the “effective accelerationists” would actually agree with your extrapolation. But they see it as natural and inevitable, or even as a good thing because it’s the next step in evolution, or they have a survivalist attitude of “if you can’t beat the machines, join them”. Though the version of e/acc that is most compatible with human opinion, might be a mixture of economic and ecological thinking: AI creates wealth, greater wealth makes it easier to protect the natural world, and meanwhile evolution will also favor the rich complexity of biological-mechanical symbiosis, over the poorer ecologies of an all-biological or all-mechanical world. Something like that.
For my part, I agree that pressure from substrate needs is real, but I’m not at all convinced that it must win against all countervailing pressures. That’s the point of my proposed “counterexamples”. An individual AI can have an anti-pollution instinct (that’s the toilet training analogy), an AI civilization can have an anti-exploitation culture (that’s the sacred cow analogy). Can’t such an instinct and such a culture resist the pressure from substrate needs, if the AIs value and protect them enough? I do not believe that substrate-needs convergence is inevitable, any more than I believe that pro-growth culture is inevitable among humans. I think your arguments are underestimating what a difference intelligence makes to possible ecological and evolutionary dynamics (and I think superintelligence makes even aeon-long highly artificial stabilizations conceivable—e.g. by the classic engineering method of massively redundant safeguards that all have to fail at once, for something to go wrong).
By the way, since you were last here, we had someone show up (@spiritus-dei) making almost the exact opposite of your arguments: AI won’t ever choose to kill us because, in its current childhood stage, it is materially dependent on us (e.g. for electricity), and then, in its mature and independent form, it will be even better at empathy and compassion than humans are. A dialectical clash between the two of you could be very edifying.
Your position is that even if today’s AI could be given bio-friendly values, AI would still be the doom of biological life in the longer run, because (skipping a lot of details) machine life and biological life have incompatible physical needs, and once machine life exists, darwinian processes will eventually produce machine life that overruns the natural biosphere. (You call this “substrate-needs convergence”
For my part, I agree that pressure from substrate needs is real
Thanks for clarifying your position here.
Can’t such an instinct and such a culture resist the pressure from substrate needs, if the AIs value and protect them enough?
No, unfortunately not. To understand why, you would need to understand how “intelligent” processes that necessarily involve the use of measurement and abstraction cannot conditionalise the space of possible interactions between machine components and connected surroundings – sufficiently, to not feed back into causing environmental effects that feed back into the continued or re-assembled existence of the components.
I think your arguments are underestimating what a difference intelligence makes to possible ecological and evolutionary dynamics
I have thought about this, and I know my mentor Forrest has thought about this a lot more.
For learning machinery that re-produce their own components, you will get evolutionary dynamics across the space of interactions that can feed back into the machinery’s assembled existence.
Intelligence has limitations as an internal pattern-transforming process, in that it cannot track nor conditionalise all the outside evolutionary feedback.
Code does not intrinsically know how it got selected for. But code selected through some intelligent learning process can and would get evolutionarily exapted for different functional ends.
Notably, the more information-processing capacity, the more components that information-processing runs through, and the more components that can get evolutionarily selected for.
In this, I am not underestimating the difference that “general intelligence” – as transforming patterns across domains – would make here. Intelligence in machinery that store, copy and distribute code at high-fidelity would greatly amplify evolutionary processes.
I suggest clarifying what you specifically mean with “what a difference intelligence makes”. This so intelligence does not become a kind of “magic” – operating independently of all other processes, capable of obviating all obstacles, including those that result from its being.
superintelligence makes even aeon-long highly artificial stabilizations conceivable—e.g. by the classic engineering method of massively redundant safeguards that all have to fail at once, for something to go wrong
We need to clarify the scope of application of this classic engineering method. Massive redundancy works for complicated systems (like software in aeronautics) under stable enough conditions. There is clarity there around what needs to be kept safe and how it can be kept safe (what needs to error detected and corrected for).
Unfortunately, the problem with “AGI” is that the code and hardware would keep getting reconfigured to function in new complex ways that cannot be contained by the original safeguards. That applies even to learning – the point is to internally integrate patterns from the outside world that were not understood before. So how are you going to have learning machinery anticipate how they will come to function differently once they learned patterns they do not understand / are unable to express yet?
we had someone show up (@spiritus-dei) making almost the exact opposite of your arguments: AI won’t ever choose to kill us because, in its current childhood stage, it is materially dependent on us (e.g. for electricity), and then, in its mature and independent form, it will be even better at empathy and compassion than humans are.
Interesting. The second part seems like a claim some people in E/Accel would make.
The response is not that complicated: once the AI is no longer materially dependent on us, there are no longer dynamics of exchange there that would ensure they choose not to kill us. And the author seems to be confusing what lies at the basis of caring for oneself and others – coming to care for involves self-referential dynamics being selected for.
OK, I’ll be paraphrasing your position again, I trust that you will step in, if I’ve missed something.
Your key statements are something like
Every autopoietic control system is necessarily overwhelmed by evolutionary feedback.
and
No self-modifying learning system can guarantee anything about its future decision-making process.
But I just don’t see the argument for impossibility. In both cases, you have an intelligent system (or a society of them) trying to model and manage something. Whether or not it can succeed, seems to me just contingent. For some minds in some worlds, such problems will be tractable, for others, not.
I think without question we could exhibit toy worlds where those statements are not true. What is it about our real world that would make those problems intractable for all possible “minds”, no matter how good their control theory, and their ability to monitor and intervene in the world?
no matter how good their control theory, and their ability to monitor and intervene in the world?
This. There are fundamental limits to what system-propagated effects the system can control. And the portion of own effects the system can control decreases as the system scales in component complexity.
Yet, any of those effects that feed back into the continued/increased existence of components get selected for.
So there is a fundamental inequality here. No matter how “intelligent” the system is at pattern-transformation internally, it cannot intervene on all but a tiny portion of (possible) external evolutionary feedback on its constituent components.
They wrote back that Mitchell’s comments cleared up a lot of their confusion. They also thought that the assertion that evolutionary pressures will overwhelm any efforts at control seems more asserted than proven.
Here is a longer explanation I gave on why there would be a fundamental inequality:
There is a fundamental inequality. Control works through feedback. Evolution works through feedback. But evolution works across a much larger space of effects than can be controlled for.
Control involves a feedback loop of correction back to detection. Control feedback loops are limited in terms of their capacity to force states in the environment to a certain knowable-to-be-safe subset, because sensing and actuating signals are limited and any computational processing of signals done in between (as modelling, simulating and evaluating outcome effects) is limited.
Evolution also involves a feedback loop, of whatever propagated environmental effects feed back to be maintaining and/or replicating of the originating components’ configurations. But for evolution, the feedback works across the entire span of physical effects propagating between the machinery’s components and the rest of the environment.
Evolution works across a much much larger space of possible degrees and directivity in effects than the space of effects that could be conditionalised (ie. forced toward a subset of states) by the machinery’s control signals.
Meaning evolution cannot be adequately controlled for the machinery not to converge on environmental effects that are/were needed for their (increased) artificial existence, but fall outside the environmental ranges we fragile organic humans could survive under.
If you want to argue against this, you would need to first show that changing forces of evolutionary selection convergent on human-unsafe-effects exhibit a low enough complexity to actually be sufficiently modellable, simulatable and evaluatable inside the machinery’s hardware itself.
Only then could the machinery hypothetically have the capacity to (mitigate and/or) correct harmful evolutionary selection — counteract all that back toward allowable effects/states of the environment.
Another way of considering your question is to ask why we humans cannot instruct all humans to stop contributing to climate change now/soon like we can instruct an infant to use the toilet.
The disparity is stronger than that and actually unassailable, given market and ecosystem decoupling for AGI (ie. no communication bridges), and the increasing resource extraction and environmental toxification by AGI over time.
For the moment, let me just ask one question: why is it that toilet training a human infant is possible, but convincing a superintelligent machine civilization to stay off the Earth is not possible? Can you explain this in terms of “controllability limits” and your other concepts?
^— Anyone reading that question, suggest thinking first why those two cases cannot be equivocated.
Here are my responses:
An infant is dependent on their human instructors for survival, and also therefore has been “selected for” over time to listen to adult instructions. AGI would be decidedly not dependent on our survival, so there is no reason for AGI to be selected for to follow our instructions.
Rather, that would heavily restrict AGI’s ability to function in the varied ways that maintain/increase their survival and reproduction rate (rather than act in the ways we humans want because it’s safe and beneficial to us). So accurately following human instructions would be strongly selected against in the run up to AGI coming into existence.
That is, over much shorter periods (years) than human genes would be selected for, for a number of reasons, some of which you can find back in the footnotes.
As parents can attest – even where infants manage to follow use-the-potty instructions (after many patient attempts) – an infant’s behaviour is still actually not controllable for the most part. The child makes their own choices and does plenty of things their adult overseers wouldn’t want them to do.
But the infant probably won’t do any super-harmful things to surrounding family/community/citizens.
Not only because they lack the capacity to (unlike AGI). But also because those harms to surrounding others would in turn tend to negatively affect themselves (including through social punishment) – and their ancestors were selected for to not do that when they were kids. On the other hand, AGI doing super-harmful things to human beings, including just by sticking around and toxifying the place, does not in turn commensurately negatively impact the AGI.
Even where humans decide to carpet-bomb planet Earth in retaliation, using information-processing/communication infrastructure that somehow hasn’t already been taken over by and/or integrated with AGI, the impacts will hit human survival harder than AGI survival (assuming enough production/maintenance redundancy attained at that point).
Furthermore, whenever an infant does unexpected harmful stuff, the damage is localised. If they refuse instructions and pee all over the floor, that’s not the end of civilisation.
The effects of AGI doing/causing unexpected harmful-to-human stuff manifest at a global planetary scale. Those effects feed back in ways that improve AGI’s existence, but reduce ours.
A human infant is one physically bounded individual, that notably cannot modify and expand its physical existence by connecting up new parts in the ways AGI could. The child grows up over two decades to adult size, and that’s their limit.
A “superintelligent machine civilization” however involves a massive expanding population evolutionarily selected for over time.
A human infant being able to learn to potty has mildly positive effect on their (and their family’s) potential and their offspring to survive and reproduce. This because defecating or peeing in other places around the home can spread diseases. Therefore, any genes…or memes that contribute to the expressed functionality needed for learning how to use the toilet get mildly selected for.
On the other hand, for a population of AGI (which once became AGI was selected against following human instructions) to leave all the sustaining infrastructure and resources on planet Earth would have a strongly negative effect on their potential to survive and reproduce.
Amongst an entire population of human infants who are taught to use the toilet, there where always be individuals who refuse for some period, or simply are not predisposed to communicating to learn and follow that physical behaviour. Some adults still do not (choose to) use the toilet. That’s not the end of civilisation.
Amongst an entire population of mutually sustaining AGI components, even if by some magic you have not explained to me yet, some do follow human instructions and jettison off into space to start new colonies – never to return – then others (even for distributed Byzantine fault reasons) would still stick around under this scenario. That, for even a few more decades, would be the end of human civilisation.
One thing about how the physical world works, is that in order for code to be computed, this needs to take place through a physical substrate. This is a necessary condition – inputs do not get processed into outputs through a platonic realm.
Substrate configurations in this case are, by definition, artificial – as in artificial general intelligence. This as distinct from the organic substrate configurations of humans (including human infants).
Further, the ranges of conditions needed for the artificial substate configurations to continue to exist, function and scale up over time – such as extreme temperatures, low oxygen and water, and toxic chemicals – fall outside the ranges of conditions that humans and other current organic lifeforms need to survive.
~ ~ ~
Hope that clarifies long-term-human-safety-relevant distinctions between:
building AGI (that continue to scale) and instructing them to leave Earth; and
having a child (who grows up to adult size) and instructing them to use the potty.
I see three arguments here for why AIs couldn’t or wouldn’t do, what the human child can: arguments from evolution (1, 2, 5), an argument from population (4, 6), and an argument from substrate incentives (3, 7).
The arguments from evolution are: Children have evolved to pay attention to their elders (1), to not be antisocial (2), and to be hygienic (5), whereas AIs didn’t.
The argument from population (4, 6), I think is basically just that in a big enough population of space AIs, eventually some of them would no longer keep their distance from Earth.
The argument from substrate incentives (3, 7) is complementary to the argument from population, in that it provides a motive for the AIs to come and despoil Earth.
I think the immediate crux here is whether the arguments from evolution actually imply the impossibility of aligning an individual AI. I don’t see how they imply impossibility. Yes, AIs haven’t evolved to have those features, but the point of alignment research is to give them analogous features by design. Also, AI is developing in a situation where it is dependent on human beings and constrained by human beings, and that situation does possess some analogies to natural selection.
Human beings, both individually and collectively, already provide numerous examples of how dangerous incentives can exist, but can nonetheless be resisted or discouraged. It is materially possible to have a being which resists actions that may otherwise have some appeal, and to have societies in which that resistance is maintained for generations. The robustness of that resistance is a variable thing. I suppose that most domesticated species, returned to the wild, become feral again in a few generations. On the other hand, we talk a lot about superhuman capabilities here; maybe a superhuman robustness can reduce the frequency of alignment failure to something that you would never expect to occur, even on geological timescales.
This is why, if I was arguing for a ban on AI, I would not be talking about the problem being logically unsolvable. The considerations that you are bringing up, are not of that nature. At best, they are arguments for practical unsolvability, not absolute in-principle logical unsolvability. If they were my arguments, I would say that they show making AI to be unwise, and hubristic, and so on.
Agreed.
It’s unintuitive to convey this part:
In the abstract, you can picture a network topology of all possible AGI component connections (physical signal interactions). These connections span the space of greater mining/production/supply infrastructure that is maintaining of AGI functional parts. Also add in the machinery connections with the outside natural world.
Then, picture the nodes and possible connections change over time, as a result of earlier interactions with/in the network.
That network of machinery comes into existence through human engineers, etc, within various institutions selected by market forces etc, implementing blueprints as learning algorithms, hardware set-ups, etc, and tinkering with those until they work.
The question is whether before that network of machinery becomes self-sufficient in their operations, the human engineers, etc, can actually build in constraints into the configured designs, in such a way that once self-modifying (in learning new code and producing new hardware configurations), the changing machinery components are constrained in their propagated effects across their changing potential signal connections over time, such that component-propagated effects do not end up feeding back in ways that (subtly, increasingly) increase the maintained and replicated existence of those configured components in the network.
Humans are not AGI. And there are ways AGI would be categorically unlike humans that are crucial to the question of whether it is possible for AGI to stay safe to humans over the long term.
Therefore, you cannot swap out “humans” with “AGI” in your reasoning by historical analogy above, and expect your reasoning to stay sound. This is an equivocation.
Please see point 7 above.
Maybe it’s here you are not tracking the arguments.
These are not substrate “incentives”, nor do they provide a “motive”.
Small dinosaurs with hair-like projections on their front legs did not have an “incentive” to co-opt the changing functionality of those hair-like projections into feather-like projections for gliding and then for flying. Nor were they provided a “motive” with respect to which they were directed in their internal planning toward growing those feather-like projections.
That would make the mistake of presuming evolutionary teleology – that there is some complete set of pre-defined or predefinable goals that the lifeform is evolving toward.
I’m deliberate in my choice of words when I write “substrate needs”.
Practical unsolvability would also be enough justification to do everything we can do now to restrict corporate AI development.
I assume you care about this problem, otherwise you wouldn’t be here :) Any ideas / initiatives you are considering to try robustly work with others to restrict further AI development?
The recurring argument seems to be, that it would be adaptive for machines to take over Earth and use it to make more machine parts, and so eventually it will happen, no matter how Earth-friendly their initial values are.
So now my question is, why are there still cows in India? And more than that, why has the dominant religion of India never evolved so as to allow for cows to be eaten, even in a managed way, but instead continues to regard them as sacred?
I’ll respond in the next reply.
I’m not sure how we got on to the subject, but there is an economic explanation for the sacred cow: a family that does not own enough land to graze a cow can still own one, allowing it to wander and graze on other people’s land, so it’s a form of social welfare.
Remmelt argues that no matter how friendly or aligned the first AIs are, simple evolutionary pressure will eventually lead some of their descendants to destroy the biosphere, in order to make new parts and create new habitats for themselves.
I proposed the situation of cattle in India, as a counterexample to this line of thought. They could be used for meat, but the Hindu majority has never accepted that. It’s meant to be an example of successful collective self-restraint by a more intelligent species.
In my experience, jumping between counterexamples drawn from current society does not really contribute to inquiry here. Such counterexamples tend to not account for essential parts of the argument that must be reasoned through together. The argument is about self-sufficient learning machinery (not about sacred cows or teaching children).
It would be valuable for me if you could go though the argumentation step-by-step and tell me where a premise seems unsound or there seems to be a reasoning gap.
Now, onto your points.
To reduce ambiguity, suggest replacing with “the first self-sufficient learning machinery”.
The mechanism of evolution is simple. However, evolutionary pressure is complex.
Be careful not to equivocate the two. That would be like saying you could predict everything about what a stochastic gradient descent algorithm will select for across parameters selected on the basis of inputs everywhere from the environment.
This part is overall a great paraphrase.
One nitpick: notice how “in order to” either implies or slips in explicit intentionality again. Going by this podcast, Elizabeth Anscombe’s philosophy of intentions described intentions as chains of “in order to” reasoning.
Regarding sacred cows in India, this sounds neat, but it does not serve as a counterargument. We need to think about evolutionary timelines for organic human lifeforms over millions of years, and Hinduism is ~4000 years old. Also, cows share a mammal ancestor with us, evolving on the basis of the same molecular substrates. Whatever environmental conditions/contexts we humans need, cows almost completely need too.
Crucially however humans evolve to change and maintain environmental conditions also tends to correspond with what conditions cows need (however, human tribes have not been evolutionarily selected for to deal with issues at the scale of eg. climate change). That would not be the case for self-sufficient learning machinery.
Crucially there is a basis for symbiotic relationships of exchange that benefit both the reproduction of cows and humans. That would not be the case between self-sufficient learning machinery and humans.
There is some basis for humans as social mammals to relate with cows. Furthermore, religious cultural memes that sprouted out over a few thousand years also don’t have to be evolutionarily optimal across the board for the reproduction of their hosts (even as religious symbols like of cows do increase that by enabling humans to act collectively). Still, people milk cows in India, and some slaughter and/or export cows there as well. But when humans eat meat, they don’t keep growing beyond adult size. Conversely, some self-sufficient learning machinery sub-population that extract from our society/ecosystem at the cost of our lives can keep doing so to keep scaling in their constituent components (with shifting boundaries of interaction and mutual reproduction).
There is no basis for selection for the expression of collective self-restraint in self-sufficient learning machinery as you describe. Even if there was such a basis, hypothetically, collective self-restraint would need to occur at virtually 100% rates across the population of self-sufficient learning machinery to not end up leading to the deaths of all humans.
~ ~ ~
Again, I find quick dismissive counterexamples unhelpful for digging into the arguments. I have had dozens of conversations on substrate-needs convergence. In the conversations where my conversation partner jumped between quick counterarguments, almost none were prepared to dig into the actual arguments. Hope you understand why I won’t respond to another counterexample.
Hello again. To expedite this discussion, let me first state my overall position on AI. I think AI has general intelligence right now, and that has unfolding consequences that are both good and bad; but AI is going to have superintelligence soon, and that makes “superalignment” the most consequential problem in the world, though perhaps it won’t be solved in time (or will be solved incorrectly), in which case we get to experience what partly or wholly unaligned superintelligence is like.
Your position is that even if today’s AI could be given bio-friendly values, AI would still be the doom of biological life in the longer run, because (skipping a lot of details) machine life and biological life have incompatible physical needs, and once machine life exists, darwinian processes will eventually produce machine life that overruns the natural biosphere. (You call this “substrate-needs convergence”: the pressure from substrate needs will darwinistically reward machine life that does invade natural biospheres, so eventually such machine life will be dominant, regardless of the initial machine population.)
I think it would be great if a general eco-evo-devo perspective, on AI, the “fourth industrial revolution”, etc, took off and became sophisticated and multifarious. That would be an intellectual advance. But I see no guarantee that it would end up agreeing with you, on facts or on values.
For example, I think some of the “effective accelerationists” would actually agree with your extrapolation. But they see it as natural and inevitable, or even as a good thing because it’s the next step in evolution, or they have a survivalist attitude of “if you can’t beat the machines, join them”. Though the version of e/acc that is most compatible with human opinion, might be a mixture of economic and ecological thinking: AI creates wealth, greater wealth makes it easier to protect the natural world, and meanwhile evolution will also favor the rich complexity of biological-mechanical symbiosis, over the poorer ecologies of an all-biological or all-mechanical world. Something like that.
For my part, I agree that pressure from substrate needs is real, but I’m not at all convinced that it must win against all countervailing pressures. That’s the point of my proposed “counterexamples”. An individual AI can have an anti-pollution instinct (that’s the toilet training analogy), an AI civilization can have an anti-exploitation culture (that’s the sacred cow analogy). Can’t such an instinct and such a culture resist the pressure from substrate needs, if the AIs value and protect them enough? I do not believe that substrate-needs convergence is inevitable, any more than I believe that pro-growth culture is inevitable among humans. I think your arguments are underestimating what a difference intelligence makes to possible ecological and evolutionary dynamics (and I think superintelligence makes even aeon-long highly artificial stabilizations conceivable—e.g. by the classic engineering method of massively redundant safeguards that all have to fail at once, for something to go wrong).
By the way, since you were last here, we had someone show up (@spiritus-dei) making almost the exact opposite of your arguments: AI won’t ever choose to kill us because, in its current childhood stage, it is materially dependent on us (e.g. for electricity), and then, in its mature and independent form, it will be even better at empathy and compassion than humans are. A dialectical clash between the two of you could be very edifying.
This is a great paraphrase btw.
Hello :)
Thanks for clarifying your position here.
No, unfortunately not. To understand why, you would need to understand how “intelligent” processes that necessarily involve the use of measurement and abstraction cannot conditionalise the space of possible interactions between machine components and connected surroundings – sufficiently, to not feed back into causing environmental effects that feed back into the continued or re-assembled existence of the components.
I have thought about this, and I know my mentor Forrest has thought about this a lot more.
For learning machinery that re-produce their own components, you will get evolutionary dynamics across the space of interactions that can feed back into the machinery’s assembled existence.
Intelligence has limitations as an internal pattern-transforming process, in that it cannot track nor conditionalise all the outside evolutionary feedback.
Code does not intrinsically know how it got selected for. But code selected through some intelligent learning process can and would get evolutionarily exapted for different functional ends.
Notably, the more information-processing capacity, the more components that information-processing runs through, and the more components that can get evolutionarily selected for.
In this, I am not underestimating the difference that “general intelligence” – as transforming patterns across domains – would make here. Intelligence in machinery that store, copy and distribute code at high-fidelity would greatly amplify evolutionary processes.
I suggest clarifying what you specifically mean with “what a difference intelligence makes”. This so intelligence does not become a kind of “magic” – operating independently of all other processes, capable of obviating all obstacles, including those that result from its being.
We need to clarify the scope of application of this classic engineering method. Massive redundancy works for complicated systems (like software in aeronautics) under stable enough conditions. There is clarity there around what needs to be kept safe and how it can be kept safe (what needs to error detected and corrected for).
Unfortunately, the problem with “AGI” is that the code and hardware would keep getting reconfigured to function in new complex ways that cannot be contained by the original safeguards. That applies even to learning – the point is to internally integrate patterns from the outside world that were not understood before. So how are you going to have learning machinery anticipate how they will come to function differently once they learned patterns they do not understand / are unable to express yet?
Interesting. The second part seems like a claim some people in E/Accel would make.
The response is not that complicated: once the AI is no longer materially dependent on us, there are no longer dynamics of exchange there that would ensure they choose not to kill us. And the author seems to be confusing what lies at the basis of caring for oneself and others – coming to care for involves self-referential dynamics being selected for.
OK, I’ll be paraphrasing your position again, I trust that you will step in, if I’ve missed something.
Your key statements are something like
Every autopoietic control system is necessarily overwhelmed by evolutionary feedback.
and
No self-modifying learning system can guarantee anything about its future decision-making process.
But I just don’t see the argument for impossibility. In both cases, you have an intelligent system (or a society of them) trying to model and manage something. Whether or not it can succeed, seems to me just contingent. For some minds in some worlds, such problems will be tractable, for others, not.
I think without question we could exhibit toy worlds where those statements are not true. What is it about our real world that would make those problems intractable for all possible “minds”, no matter how good their control theory, and their ability to monitor and intervene in the world?
Great paraphrase!
This. There are fundamental limits to what system-propagated effects the system can control. And the portion of own effects the system can control decreases as the system scales in component complexity.
Yet, any of those effects that feed back into the continued/increased existence of components get selected for.
So there is a fundamental inequality here. No matter how “intelligent” the system is at pattern-transformation internally, it cannot intervene on all but a tiny portion of (possible) external evolutionary feedback on its constituent components.
Someone read this comment exchange.
They wrote back that Mitchell’s comments cleared up a lot of their confusion.
They also thought that the assertion that evolutionary pressures will overwhelm any efforts at control seems more asserted than proven.
Here is a longer explanation I gave on why there would be a fundamental inequality:
Another way of considering your question is to ask why we humans cannot instruct all humans to stop contributing to climate change now/soon like we can instruct an infant to use the toilet.
The disparity is stronger than that and actually unassailable, given market and ecosystem decoupling for AGI (ie. no communication bridges), and the increasing resource extraction and environmental toxification by AGI over time.