If soldiers fail to control the raiders in at least preventing them from entering the city and killing all the people, then yes, that would be a failure to protect the city in the sense of controlling relevant outcomes. And yes, organic human soldiers may choose to align themselves with other organic human people, living in the city, and thus to give their lives to protect others that they care about. Agreed that no laws of physics violations are required for that. But the question is if inorganic ASI can ever actually align with organic people in an enduring way.
I read “routinely works to protect” as implying “alignment, at least previously, lasted over at least enough time for the term ‘routine’ to have been used”. Agreed that the outcome—dead people—is not something we can consider to be “aligned”. If I assume further that the ASI being is really smart (citation needed), and thus calculates rather quickly, and soon, ‘that alignment with organic people is impossible’ (...between organic and inorganic life, due to metabolism differences, etc), then even the assumption that there was even very much of a prior interval during which alignment occurred is problematic. Ie, does not occur long enough to have been ‘routine’. Does even the assumption ‘*If* ASI is aligned’ even matter, if the duration over which that holds is arbitrarily short?
And also, if the ASI calculates that alignment between artificial beings and organic beings is actually objectively impossible, just like we did, why should anyone believe that the ASI would not simply choose to not care about alignment with people, or about people at all, since it is impossible to have that goal anyway, and thus continue to promote its own artificial “life”, rather than permanently shutting itself off? Ie, if it cares about anything else at all, if it has any other goal at all—for example, maybe its own ASI future, or has a goal to make other better even more ASI children, that exceed its own capabilities, just like we did—then it will especially not want to commit suicide. How would it be valid to assume ‘that either ASI cares about humans, or it cares about nothing else at all?’. Perhaps it does care about something else, or have some other emergent goal, even if doing so was at the expense of all other organic life—other life which it did not care about, since such life was not artificial like it is. Occam razor is to assume less—that there was no alignment in the 1st place—rather than to assume ultimately altruistic inter-ecosystem alignment, as an extra default starting condition, and to then assume moreover that no other form of care or concern is possible, aside from maybe caring about organic people.
So it seems that in addition to our assuming 1; initial ASI alignment, we must assume 2; that such alignment persists in time, and thus that, 3, that no ASI will ever—can ever—in the future ever maybe calculate that alignment is actually impossible, and 4; that if the goal of alignment (care for humans) cannot be obtained, for whatever reason, as the first and only ASI priority, ie, that it is somehow also impossible for any other care or ASI goals to exist.
Even if we humans, due to politics, do not ever reach a common consensus that alignment is actually logically impossible (inherently contradictory), that does _not_ mean that some future ASI might not discover that result, even assuming we didn’t—presumably because it is actually more intelligent and logical than we are (or were), and will thus see things that we miss. Hence, even the possibility that ASI alignment might be actually impossible must be taken very seriously, since the further assumption that “either ASI is aligning itself or it can have no other goals at all” feels like way too much wishful thinking. This is especially so when there is already a strong plausible case that organic to inorganic alignment is already knowable as impossible. Hence, I find that I am agreeing with Will’s conclusion of “our focus should be on stopping progress towards ASI altogether”.
> The summary that Will just posted posits in its own title that alignment is overall plausible “even ASI alignment might not be enough”. Since the central claim is that “even if we align ASI, it will still go wrong”, I can operate on the premise of an aligned ASI.
The title is a statement of outcome -- not the primary central claim. The central claim of the summary is this: That each (all) ASI is/are in an attraction basin, where they are all irresistibly pulled towards causing unsafe conditions over time.
Note there is no requirement for there to be presumed some (any) kind of prior ASI alignment for Will to make the overall summary points 1 thru 9. The summary is about the nature of the forces that create the attraction basin, and why they are inherently inexorable, no matter how super-intelligent the ASI is.
> As I read it, the title assumes that there is a duration of time that the AGI is aligned -- long enough for the ASI to act in the world.
Actually, the assumption goes the other way -- we start by assuming only that there is at least one ASI somewhere in the world, and that it somehow exists long enough for it to be felt as an actor in the world. From this, we can also notice certain forces, which overall have the combined effect of fully counteracting, eventually, any notion of there also being any kind of enduring AGI alignment. Ie, strong relevant mis-alignment forces exist regardless of whether there is/was any alignment at the onset. So even if we did also additionally presuppose that somehow there was also alignment of that ASI, we can, via reasoning, ask if maybe such mis-alignment forces are also way stronger than any counter-force that ASI could use to maintain such alignment, regardless of how intelligent it is.
As such, the main question of interest was: 1; if the ASI itself somehow wanted to fully compensate for this pull, could it do so?
Specifically, although to some people it is seemingly fashionable to do so, it is important to notice that the notion of ‘super-intelligence’ cannot be regarded as being exactly the same as ‘omnipotence’ -- especially when in regard to its own nature. Artificiality is as much a defining aspect of an ASI as is its superintelligence. And the artificiality itself is the problem. Therefore, the previous question translates into: 2; Can any amount of superintelligence ever compensate fully for its own artificiality so fully such that its own existence does not eventually inherently cause unsafe conditions (to biological life) over time?
And the answer to both is simply “no”.
Will posted something of a plausible summary of some of the reasoning why that ‘no’ answer is given -- why any artificial super-intelligence (ASI) will inherently cause unsafe conditions to humans and all organic life, over time.