That’s an important consideration. Good to dig into.
I think there are many instances of humans, flawed and limited though we are, managing to operate systems with a very low failure rate.
Agreed. Engineers are able to make very complicated systems function with very low failure rates.
Given the extreme risks we’re facing, I’d want to check whether that claim also translates to ‘AGI’.
Does how we are able to manage current software and hardware systems to operate correspond soundly with how self-learning and self-maintaining machinery (‘AGI’) control how their components operate?
Given ‘AGI’ that no longer need humans to continue to operate and maintain own functional components over time, would the ‘AGI’ end up operating in ways that are categorically different from how our current software-hardware stacks operate?
Given that we can manage to operate current relatively static systems to have very low failure rates for the short-term failure scenarios we have identified, does this imply that the effects of introducing ‘AGI’ into our environment could also be controlled to have a very low aggregate failure rate – over the long term across all physically possible (combinations of) failures leading to human extinction?
to spend extra resources on backup systems and safety, such that small errors get actively cancelled out rather than compounding.
Errors can be corrected out with high confidence (consistency) at the bit level. Backups and redundancy also work well in eg. aeronautics, where the code base itself is not constantly changing.
How does the application of error correction change at larger scales?
How completely can possible errors be defined and corrected for at the scale of, for instance:
software running on a server?
a large neural network running on top of the server software?
an entire machine-automated economy?
Do backups work when the runtime code keeps changing (as learned from new inputs), and hardware configurations can also subtly change (through physical assembly processes)?
Since intelligence is explicitly the thing which is necessary to deliberately create and maintain such protections, I would expect control to be easier for an ASI.
It is true that ‘intelligence’ affords more capacity to control environmental effects.
Noticing too that the more ‘intelligence,’ the more information-processing components. And that the more information-processing components added, the exponentially more degrees of freedom of interaction those and other functional components can have with each other and with connected environmental contexts.
I disagree that small errors necessarily compound until reaching a threshold of functional failure.
For this claim to be true, the following has to be true:
a. There is no concurrent process that selects for “functional errors” as convergent on “functional failure” (failure in the sense that the machinery fails to function safely enough for humans to exist in the environment, rather than that the machinery fails to continue to operate).
Unfortunately, in the case of ‘AGI’, there are two convergent processes we know about:
Instrumental convergence, resulting from internal optimization: code components being optimized for (an expanding set of) explicit goals.
Substrate-needs convergence, resulting from external selection: all components being selected for (an expanding set of) implicit needs.
Or else – where there is indeed selective pressure convergent on “functional failure” – then the following must be true for the quoted claim to hold:
b. The various errors introduced into and selected for in the machinery over time could be detected and corrected for comprehensively and fastenough(by any built-in control method) to prevent later “functional failure” from occurring.
As a real world example, consider Boeing. The FAA, and Boeing both, supposedly and allegedly, had policies and internal engineering practices—all of which are control procedures—which should have been good enough to prevent an aircraft from suddenly and unexpectedly loosing a door during flight. Note that this occurred after an increase in control intelligence—after two disasters of whole Max aircraft lost. On the basis of small details of mere whim, of who choose to sit where, there could have been someone sitting in that particular seat. Their loss of life would surely count as a “safety failure”. Ie, it is directly “some number of small errors actually compounding until reaching a threshold of functional failure” (sic). As it is with any major problem like that—lots of small things compounding to make a big thing.
Control failures occur in all of the places where intelligence forgot to look, usually at some other level of abstraction than the one you are controlling for. Some person on some shop floor got distracted at some critical moment—maybe they got some text message on their phone at exactly the right time—and thus just did not remember to put the bolts in. Maybe some other worker happened to have had a bad conversation with their girl that morning, and thus that one day happened to have never inspected the bolts on that particular door. Lots of small incidents—at least some of which should have been controlled for (and were not actually) -- which combine in some unexpected pattern to produce a new possibility of outcome—explosive decompression.
So is it the case that control procedures work? Yes, usually, for most kinds of problems, most of the time. Does adding even more intelligence usually improve the degree to which control works? Yes, usually, for most kinds of problems, most of the time. But does that in itself imply that such—intelligence and control—will work sufficiently well for every circumstance, every time? No, it does not.
Maybe we should ask Boeing management to try to control the girlfriends of all workers so that no employees ever have a bad day and forget to inspect something important? What if most of the aircraft is made of ‘something important’ to safety—ie, to maximize fuel efficiency, for example?
There will always be some level of abstraction—some constellation of details—for which some subtle change can result in wholly effective causative results. Given that a control model must be simpler than the real world, the question becomes ‘are all relevant aspects of the world’ correctly modeled? Which is not just a question of if the model is right, but if it is the right model—ie, is the boundary between what is necessary to model and what is actually not important—can itself be very complex, and that this is a different kind of complexity than that associated with the model. How do we ever know that we have modeled all relevant aspects in all relevant ways? That is an abstraction problem, and it is different in kind than the modeling problem. Stacking control process on control process at however many meta levels, still does not fix it. And it gets worse as the complexity of the boundary between relevant and non-relevant increases, and also worse as the number of relevant levels of abstractions over which that boundary operates also increases.
Basically, every (unintended) engineering disaster that has ever occurred indicates a place where the control theory being used did not account for some factor that later turned out to be vitally important. If we always knew in advance “all of the relevant factors”(tm), then maybe we could control for them. However, with the problem of alignment, the entire future is composed almost entirely of unknown factors—factors which are purely situational. And wholly unlike with every other engineering problem yet faced, we cannot, at any future point, ever assume that this number of relevant unknown factors will ever decrease. This is characteristically different than all prior engineering challenges—ones where more learning made controlling things more tractable. But ASI is not like that. It is itself learning. And this is a key difference and distinction. It runs up against the limits of control theory itself, against the limits of what is possible in any rational conception of physics. And if we continue to ignore that difference, we do so at our mutual peril.
That’s an important consideration. Good to dig into.
Agreed. Engineers are able to make very complicated systems function with very low failure rates.
Given the extreme risks we’re facing, I’d want to check whether that claim also translates to ‘AGI’.
Does how we are able to manage current software and hardware systems to operate correspond soundly with how self-learning and self-maintaining machinery (‘AGI’) control how their components operate?
Given ‘AGI’ that no longer need humans to continue to operate and maintain own functional components over time, would the ‘AGI’ end up operating in ways that are categorically different from how our current software-hardware stacks operate?
Given that we can manage to operate current relatively static systems to have very low failure rates for the short-term failure scenarios we have identified, does this imply that the effects of introducing ‘AGI’ into our environment could also be controlled to have a very low aggregate failure rate – over the long term across all physically possible (combinations of) failures leading to human extinction?
This gets right into the topic of the conversation with Anders Sandberg. I suggest giving that a read!
Errors can be corrected out with high confidence (consistency) at the bit level. Backups and redundancy also work well in eg. aeronautics, where the code base itself is not constantly changing.
How does the application of error correction change at larger scales?
How completely can possible errors be defined and corrected for at the scale of, for instance:
software running on a server?
a large neural network running on top of the server software?
an entire machine-automated economy?
Do backups work when the runtime code keeps changing (as learned from new inputs), and hardware configurations can also subtly change (through physical assembly processes)?
It is true that ‘intelligence’ affords more capacity to control environmental effects.
Noticing too that the more ‘intelligence,’ the more information-processing components. And that the more information-processing components added, the exponentially more degrees of freedom of interaction those and other functional components can have with each other and with connected environmental contexts.
Here is a nitty-gritty walk-through in case useful for clarifying components’ degrees of freedom.
For this claim to be true, the following has to be true:
a. There is no concurrent process that selects for “functional errors” as convergent on “functional failure” (failure in the sense that the machinery fails to function safely enough for humans to exist in the environment, rather than that the machinery fails to continue to operate).
Unfortunately, in the case of ‘AGI’, there are two convergent processes we know about:
Instrumental convergence, resulting from internal optimization:
code components being optimized for (an expanding set of) explicit goals.
Substrate-needs convergence, resulting from external selection:
all components being selected for (an expanding set of) implicit needs.
Or else – where there is indeed selective pressure convergent on “functional failure” – then the following must be true for the quoted claim to hold:
b. The various errors introduced into and selected for in the machinery over time could be detected and corrected for comprehensively and fast enough (by any built-in control method) to prevent later “functional failure” from occurring.
As a real world example, consider Boeing. The FAA, and Boeing both, supposedly and allegedly, had policies and internal engineering practices—all of which are control procedures—which should have been good enough to prevent an aircraft from suddenly and unexpectedly loosing a door during flight. Note that this occurred after an increase in control intelligence—after two disasters of whole Max aircraft lost. On the basis of small details of mere whim, of who choose to sit where, there could have been someone sitting in that particular seat. Their loss of life would surely count as a “safety failure”. Ie, it is directly “some number of small errors actually compounding until reaching a threshold of functional failure” (sic). As it is with any major problem like that—lots of small things compounding to make a big thing.
Control failures occur in all of the places where intelligence forgot to look, usually at some other level of abstraction than the one you are controlling for. Some person on some shop floor got distracted at some critical moment—maybe they got some text message on their phone at exactly the right time—and thus just did not remember to put the bolts in. Maybe some other worker happened to have had a bad conversation with their girl that morning, and thus that one day happened to have never inspected the bolts on that particular door. Lots of small incidents—at least some of which should have been controlled for (and were not actually) -- which combine in some unexpected pattern to produce a new possibility of outcome—explosive decompression.
So is it the case that control procedures work? Yes, usually, for most kinds of problems, most of the time. Does adding even more intelligence usually improve the degree to which control works? Yes, usually, for most kinds of problems, most of the time. But does that in itself imply that such—intelligence and control—will work sufficiently well for every circumstance, every time? No, it does not.
Maybe we should ask Boeing management to try to control the girlfriends of all workers so that no employees ever have a bad day and forget to inspect something important? What if most of the aircraft is made of ‘something important’ to safety—ie, to maximize fuel efficiency, for example?
There will always be some level of abstraction—some constellation of details—for which some subtle change can result in wholly effective causative results. Given that a control model must be simpler than the real world, the question becomes ‘are all relevant aspects of the world’ correctly modeled? Which is not just a question of if the model is right, but if it is the right model—ie, is the boundary between what is necessary to model and what is actually not important—can itself be very complex, and that this is a different kind of complexity than that associated with the model. How do we ever know that we have modeled all relevant aspects in all relevant ways? That is an abstraction problem, and it is different in kind than the modeling problem. Stacking control process on control process at however many meta levels, still does not fix it. And it gets worse as the complexity of the boundary between relevant and non-relevant increases, and also worse as the number of relevant levels of abstractions over which that boundary operates also increases.
Basically, every (unintended) engineering disaster that has ever occurred indicates a place where the control theory being used did not account for some factor that later turned out to be vitally important. If we always knew in advance “all of the relevant factors”(tm), then maybe we could control for them. However, with the problem of alignment, the entire future is composed almost entirely of unknown factors—factors which are purely situational. And wholly unlike with every other engineering problem yet faced, we cannot, at any future point, ever assume that this number of relevant unknown factors will ever decrease. This is characteristically different than all prior engineering challenges—ones where more learning made controlling things more tractable. But ASI is not like that. It is itself learning. And this is a key difference and distinction. It runs up against the limits of control theory itself, against the limits of what is possible in any rational conception of physics. And if we continue to ignore that difference, we do so at our mutual peril.