One might wonder whether the inability to control one’s subsystems is a limitation that applies to ASI. Even ASI, however, faces causal limits to its ability to control the world. It would not be reasonable, for example, to assume that ASI will be capable of building perpetual motion machines or faster-than-light travel. One category of impossible tasks is complete prediction of all of the relevant consequences of an agent’s actions on the real world. Sensors can only take in limited inputs (affected by noise), actuators can only have limited influence (also affected by noise), and world-models and simulations necessarily make simplifying assumptions. In other words, the law of unintended consequences holds true even for ASI. Further, the scale of these errors increases as the ASI does things that affect the entire world, gains more interacting components, and must account for increasingly complex feedback loops.
It does not seem at all clear to me how one can argue that unintended effects inevitably lead to a system as a whole going out of control. I agree that some small amount of error is nearly inevitable. I disagree that small errors necessarily compound until reaching a threshold of functional failure. I think there are many instances of humans, flawed and limited though we are, managing to operate systems with a very low failure rate. And importantly, it is possible to act at below-maximum-challenge-level, and to spend extra resources on backup systems and safety, such that small errors get actively cancelled out rather than compounding. Since intelligence is explicitly the thing which is necessary to deliberately create and maintain such protections, I would expect control to be easier for an ASI.
Without the specific piece of assuming an ASI would fail to keep its own systems under its control, the rest of the argument doesn’t hold.
On reflection, I suspect the crux here is a differing conception of what kind of failures are important. I’ve written a follow-up post that comes at this topic from a different direction and I would be very interested in your feedback: https://www.lesswrong.com/posts/NFYLjoa25QJJezL9f/lenses-of-control.
This sounds like a rejection of premise 5, not 1 & 2. The latter asserts that control issues are present at all (and 3 & 4 assert relevance), whereas the former asserts that the magnitude of these issues is great enough to kick off a process of accumulating problems. You are correct that the rest of the argument, including the conclusion, does not hold if this premise is false.
Your objection seems to be to point to the analogy of humans maintaining effective control of complex systems, with errors limiting rather than compounding, with the further assertion that a greater intelligence will be even better at such management.
Besides intelligence, there are two other core points of difference between humans managing existing complex systems and ASI:
1) The scope of the systems being managed. Implicit in what I have read of SNC is that ASI is shaping the course of world events. 2) ASI’s lack of inherent reliance on the biological world.
These points raise the following questions: 1) Do systems of control get better or worse as they increase in scope of impact and where does this trajectory point for ASI? 2) To what extent are humans’ ability to control our created systems reliant on us being a part of and dependent upon the natural world?
This second question probably sounds a little weird, so let me unpack the associated intuitions, albeit at the risk of straying from the actual assertions of SNC. Technology that is adaptive becomes obligate, meaning that once it exists everyone has to use it to not get left behind by those who use it. Using a given technology shapes the environment and also promotes certain behavior patterns, which in turn shape values and worldview. These tendencies together can sometimes result in feedback loops resulting in outcomes that everyone, including the creators of the technology, don’t like. In really bad cases, this can lead to self-terminating catastrophes (in local areas historically, now with the potential to be on global scales). Noticing and anticipating this pattern, however, leads to countervailing forces that push us to think more holistically than we otherwise would (either directly through extra planning or indirectly through customs of forgotten purpose). For an AI to fall into such a trap, however, means the death of humanity, not itself, so this countervailing force is not present.
That’s an important consideration. Good to dig into.
I think there are many instances of humans, flawed and limited though we are, managing to operate systems with a very low failure rate.
Agreed. Engineers are able to make very complicated systems function with very low failure rates.
Given the extreme risks we’re facing, I’d want to check whether that claim also translates to ‘AGI’.
Does how we are able to manage current software and hardware systems to operate correspond soundly with how self-learning and self-maintaining machinery (‘AGI’) control how their components operate?
Given ‘AGI’ that no longer need humans to continue to operate and maintain own functional components over time, would the ‘AGI’ end up operating in ways that are categorically different from how our current software-hardware stacks operate?
Given that we can manage to operate current relatively static systems to have very low failure rates for the short-term failure scenarios we have identified, does this imply that the effects of introducing ‘AGI’ into our environment could also be controlled to have a very low aggregate failure rate – over the long term across all physically possible (combinations of) failures leading to human extinction?
to spend extra resources on backup systems and safety, such that small errors get actively cancelled out rather than compounding.
Errors can be corrected out with high confidence (consistency) at the bit level. Backups and redundancy also work well in eg. aeronautics, where the code base itself is not constantly changing.
How does the application of error correction change at larger scales?
How completely can possible errors be defined and corrected for at the scale of, for instance:
software running on a server?
a large neural network running on top of the server software?
an entire machine-automated economy?
Do backups work when the runtime code keeps changing (as learned from new inputs), and hardware configurations can also subtly change (through physical assembly processes)?
Since intelligence is explicitly the thing which is necessary to deliberately create and maintain such protections, I would expect control to be easier for an ASI.
It is true that ‘intelligence’ affords more capacity to control environmental effects.
Noticing too that the more ‘intelligence,’ the more information-processing components. And that the more information-processing components added, the exponentially more degrees of freedom of interaction those and other functional components can have with each other and with connected environmental contexts.
I disagree that small errors necessarily compound until reaching a threshold of functional failure.
For this claim to be true, the following has to be true:
a. There is no concurrent process that selects for “functional errors” as convergent on “functional failure” (failure in the sense that the machinery fails to function safely enough for humans to exist in the environment, rather than that the machinery fails to continue to operate).
Unfortunately, in the case of ‘AGI’, there are two convergent processes we know about:
Instrumental convergence, resulting from internal optimization: code components being optimized for (an expanding set of) explicit goals.
Substrate-needs convergence, resulting from external selection: all components being selected for (an expanding set of) implicit needs.
Or else – where there is indeed selective pressure convergent on “functional failure” – then the following must be true for the quoted claim to hold:
b. The various errors introduced into and selected for in the machinery over time could be detected and corrected for comprehensively and fastenough(by any built-in control method) to prevent later “functional failure” from occurring.
As a real world example, consider Boeing. The FAA, and Boeing both, supposedly and allegedly, had policies and internal engineering practices—all of which are control procedures—which should have been good enough to prevent an aircraft from suddenly and unexpectedly loosing a door during flight. Note that this occurred after an increase in control intelligence—after two disasters of whole Max aircraft lost. On the basis of small details of mere whim, of who choose to sit where, there could have been someone sitting in that particular seat. Their loss of life would surely count as a “safety failure”. Ie, it is directly “some number of small errors actually compounding until reaching a threshold of functional failure” (sic). As it is with any major problem like that—lots of small things compounding to make a big thing.
Control failures occur in all of the places where intelligence forgot to look, usually at some other level of abstraction than the one you are controlling for. Some person on some shop floor got distracted at some critical moment—maybe they got some text message on their phone at exactly the right time—and thus just did not remember to put the bolts in. Maybe some other worker happened to have had a bad conversation with their girl that morning, and thus that one day happened to have never inspected the bolts on that particular door. Lots of small incidents—at least some of which should have been controlled for (and were not actually) -- which combine in some unexpected pattern to produce a new possibility of outcome—explosive decompression.
So is it the case that control procedures work? Yes, usually, for most kinds of problems, most of the time. Does adding even more intelligence usually improve the degree to which control works? Yes, usually, for most kinds of problems, most of the time. But does that in itself imply that such—intelligence and control—will work sufficiently well for every circumstance, every time? No, it does not.
Maybe we should ask Boeing management to try to control the girlfriends of all workers so that no employees ever have a bad day and forget to inspect something important? What if most of the aircraft is made of ‘something important’ to safety—ie, to maximize fuel efficiency, for example?
There will always be some level of abstraction—some constellation of details—for which some subtle change can result in wholly effective causative results. Given that a control model must be simpler than the real world, the question becomes ‘are all relevant aspects of the world’ correctly modeled? Which is not just a question of if the model is right, but if it is the right model—ie, is the boundary between what is necessary to model and what is actually not important—can itself be very complex, and that this is a different kind of complexity than that associated with the model. How do we ever know that we have modeled all relevant aspects in all relevant ways? That is an abstraction problem, and it is different in kind than the modeling problem. Stacking control process on control process at however many meta levels, still does not fix it. And it gets worse as the complexity of the boundary between relevant and non-relevant increases, and also worse as the number of relevant levels of abstractions over which that boundary operates also increases.
Basically, every (unintended) engineering disaster that has ever occurred indicates a place where the control theory being used did not account for some factor that later turned out to be vitally important. If we always knew in advance “all of the relevant factors”(tm), then maybe we could control for them. However, with the problem of alignment, the entire future is composed almost entirely of unknown factors—factors which are purely situational. And wholly unlike with every other engineering problem yet faced, we cannot, at any future point, ever assume that this number of relevant unknown factors will ever decrease. This is characteristically different than all prior engineering challenges—ones where more learning made controlling things more tractable. But ASI is not like that. It is itself learning. And this is a key difference and distinction. It runs up against the limits of control theory itself, against the limits of what is possible in any rational conception of physics. And if we continue to ignore that difference, we do so at our mutual peril.
It does not seem at all clear to me how one can argue that unintended effects inevitably lead to a system as a whole going out of control. I agree that some small amount of error is nearly inevitable. I disagree that small errors necessarily compound until reaching a threshold of functional failure. I think there are many instances of humans, flawed and limited though we are, managing to operate systems with a very low failure rate. And importantly, it is possible to act at below-maximum-challenge-level, and to spend extra resources on backup systems and safety, such that small errors get actively cancelled out rather than compounding. Since intelligence is explicitly the thing which is necessary to deliberately create and maintain such protections, I would expect control to be easier for an ASI.
Without the specific piece of assuming an ASI would fail to keep its own systems under its control, the rest of the argument doesn’t hold.
On reflection, I suspect the crux here is a differing conception of what kind of failures are important. I’ve written a follow-up post that comes at this topic from a different direction and I would be very interested in your feedback: https://www.lesswrong.com/posts/NFYLjoa25QJJezL9f/lenses-of-control.
This sounds like a rejection of premise 5, not 1 & 2. The latter asserts that control issues are present at all (and 3 & 4 assert relevance), whereas the former asserts that the magnitude of these issues is great enough to kick off a process of accumulating problems. You are correct that the rest of the argument, including the conclusion, does not hold if this premise is false.
Your objection seems to be to point to the analogy of humans maintaining effective control of complex systems, with errors limiting rather than compounding, with the further assertion that a greater intelligence will be even better at such management.
Besides intelligence, there are two other core points of difference between humans managing existing complex systems and ASI:
1) The scope of the systems being managed. Implicit in what I have read of SNC is that ASI is shaping the course of world events.
2) ASI’s lack of inherent reliance on the biological world.
These points raise the following questions:
1) Do systems of control get better or worse as they increase in scope of impact and where does this trajectory point for ASI?
2) To what extent are humans’ ability to control our created systems reliant on us being a part of and dependent upon the natural world?
This second question probably sounds a little weird, so let me unpack the associated intuitions, albeit at the risk of straying from the actual assertions of SNC. Technology that is adaptive becomes obligate, meaning that once it exists everyone has to use it to not get left behind by those who use it. Using a given technology shapes the environment and also promotes certain behavior patterns, which in turn shape values and worldview. These tendencies together can sometimes result in feedback loops resulting in outcomes that everyone, including the creators of the technology, don’t like. In really bad cases, this can lead to self-terminating catastrophes (in local areas historically, now with the potential to be on global scales). Noticing and anticipating this pattern, however, leads to countervailing forces that push us to think more holistically than we otherwise would (either directly through extra planning or indirectly through customs of forgotten purpose). For an AI to fall into such a trap, however, means the death of humanity, not itself, so this countervailing force is not present.
That’s an important consideration. Good to dig into.
Agreed. Engineers are able to make very complicated systems function with very low failure rates.
Given the extreme risks we’re facing, I’d want to check whether that claim also translates to ‘AGI’.
Does how we are able to manage current software and hardware systems to operate correspond soundly with how self-learning and self-maintaining machinery (‘AGI’) control how their components operate?
Given ‘AGI’ that no longer need humans to continue to operate and maintain own functional components over time, would the ‘AGI’ end up operating in ways that are categorically different from how our current software-hardware stacks operate?
Given that we can manage to operate current relatively static systems to have very low failure rates for the short-term failure scenarios we have identified, does this imply that the effects of introducing ‘AGI’ into our environment could also be controlled to have a very low aggregate failure rate – over the long term across all physically possible (combinations of) failures leading to human extinction?
This gets right into the topic of the conversation with Anders Sandberg. I suggest giving that a read!
Errors can be corrected out with high confidence (consistency) at the bit level. Backups and redundancy also work well in eg. aeronautics, where the code base itself is not constantly changing.
How does the application of error correction change at larger scales?
How completely can possible errors be defined and corrected for at the scale of, for instance:
software running on a server?
a large neural network running on top of the server software?
an entire machine-automated economy?
Do backups work when the runtime code keeps changing (as learned from new inputs), and hardware configurations can also subtly change (through physical assembly processes)?
It is true that ‘intelligence’ affords more capacity to control environmental effects.
Noticing too that the more ‘intelligence,’ the more information-processing components. And that the more information-processing components added, the exponentially more degrees of freedom of interaction those and other functional components can have with each other and with connected environmental contexts.
Here is a nitty-gritty walk-through in case useful for clarifying components’ degrees of freedom.
For this claim to be true, the following has to be true:
a. There is no concurrent process that selects for “functional errors” as convergent on “functional failure” (failure in the sense that the machinery fails to function safely enough for humans to exist in the environment, rather than that the machinery fails to continue to operate).
Unfortunately, in the case of ‘AGI’, there are two convergent processes we know about:
Instrumental convergence, resulting from internal optimization:
code components being optimized for (an expanding set of) explicit goals.
Substrate-needs convergence, resulting from external selection:
all components being selected for (an expanding set of) implicit needs.
Or else – where there is indeed selective pressure convergent on “functional failure” – then the following must be true for the quoted claim to hold:
b. The various errors introduced into and selected for in the machinery over time could be detected and corrected for comprehensively and fast enough (by any built-in control method) to prevent later “functional failure” from occurring.
As a real world example, consider Boeing. The FAA, and Boeing both, supposedly and allegedly, had policies and internal engineering practices—all of which are control procedures—which should have been good enough to prevent an aircraft from suddenly and unexpectedly loosing a door during flight. Note that this occurred after an increase in control intelligence—after two disasters of whole Max aircraft lost. On the basis of small details of mere whim, of who choose to sit where, there could have been someone sitting in that particular seat. Their loss of life would surely count as a “safety failure”. Ie, it is directly “some number of small errors actually compounding until reaching a threshold of functional failure” (sic). As it is with any major problem like that—lots of small things compounding to make a big thing.
Control failures occur in all of the places where intelligence forgot to look, usually at some other level of abstraction than the one you are controlling for. Some person on some shop floor got distracted at some critical moment—maybe they got some text message on their phone at exactly the right time—and thus just did not remember to put the bolts in. Maybe some other worker happened to have had a bad conversation with their girl that morning, and thus that one day happened to have never inspected the bolts on that particular door. Lots of small incidents—at least some of which should have been controlled for (and were not actually) -- which combine in some unexpected pattern to produce a new possibility of outcome—explosive decompression.
So is it the case that control procedures work? Yes, usually, for most kinds of problems, most of the time. Does adding even more intelligence usually improve the degree to which control works? Yes, usually, for most kinds of problems, most of the time. But does that in itself imply that such—intelligence and control—will work sufficiently well for every circumstance, every time? No, it does not.
Maybe we should ask Boeing management to try to control the girlfriends of all workers so that no employees ever have a bad day and forget to inspect something important? What if most of the aircraft is made of ‘something important’ to safety—ie, to maximize fuel efficiency, for example?
There will always be some level of abstraction—some constellation of details—for which some subtle change can result in wholly effective causative results. Given that a control model must be simpler than the real world, the question becomes ‘are all relevant aspects of the world’ correctly modeled? Which is not just a question of if the model is right, but if it is the right model—ie, is the boundary between what is necessary to model and what is actually not important—can itself be very complex, and that this is a different kind of complexity than that associated with the model. How do we ever know that we have modeled all relevant aspects in all relevant ways? That is an abstraction problem, and it is different in kind than the modeling problem. Stacking control process on control process at however many meta levels, still does not fix it. And it gets worse as the complexity of the boundary between relevant and non-relevant increases, and also worse as the number of relevant levels of abstractions over which that boundary operates also increases.
Basically, every (unintended) engineering disaster that has ever occurred indicates a place where the control theory being used did not account for some factor that later turned out to be vitally important. If we always knew in advance “all of the relevant factors”(tm), then maybe we could control for them. However, with the problem of alignment, the entire future is composed almost entirely of unknown factors—factors which are purely situational. And wholly unlike with every other engineering problem yet faced, we cannot, at any future point, ever assume that this number of relevant unknown factors will ever decrease. This is characteristically different than all prior engineering challenges—ones where more learning made controlling things more tractable. But ASI is not like that. It is itself learning. And this is a key difference and distinction. It runs up against the limits of control theory itself, against the limits of what is possible in any rational conception of physics. And if we continue to ignore that difference, we do so at our mutual peril.