As people get more desperate in attempting to prevent AGI x-risk, e.g. as AI progress draws closer & closer to AGI without satisfactory progress in alignment, the more reckless they will inevitably get in resorting to so-called “hail mary” and more “rushed” alignment techniques that carry a higher chance of s-risk. These are less careful and “principled”/formal theory based techniques (e.g. like MIRI’s Agent Foundations agenda) but more hasty last-ditch ideas that could have more unforeseen consequences or fail in nastier ways, including s-risks. This is a phenomenon we need to be highly vigilant in working to prevent. Otherwise, it’s virtually assured to happen; as, if faced with the arrival of AGI without yet having a good formal solution to alignment, most humans would likely choose a strategy that at least has a chance of working (trying a hail mary technique) instead of certain death (deploying their AGI without any alignment at all), despite the worse s-risk implications. To illustrate this, even Eliezer Yudkowsky, who wrote Separation from hyperexistential risk, has written this (due to his increased pessimism about alignment progress):
At this point, I no longer care how it works, I don’t care how you got there, I am cause-agnostic about whatever methodology you used, all I am looking at is prospective results, all I want is that we have justifiable cause to believe of a pivotally useful AGI ‘this will not kill literally everyone’.
The big ask from AGI alignment, the basic challenge I am saying is too difficult, is to obtain by any strategy whatsoever a significant chance of there being any survivors. (source)
If even the originator of these ideas now has such a singleminded preoccupation with x-risk to the detriment of s-risk, how could we expect better from anyone else? Basically, in the face of imminent death, people will get desperate enough to do anything to prevent that, and suddenly s-risk considerations become a secondary afterthought (if they were even aware of s-risks at all). In their mad scramble to avert extinction risk, s-risks get trampled over, with potentially unthinkable results. One possible idea to mitigate this risk would be, instead of trying to (perhaps unrealistically) prevent any AI development group worldwide from attempting hail mary type techniques in case the “mainline”/ideal alignment directions don’t bear fruit in time, we could try to hash out the different possible options in that class, analyze which ones have unacceptably high s-risk to definitely avoid, or less s-risk which may be preferable, and publicize this research in advance for anyone eventually in that position to consult. This would serve to at least raise awareness of s-risks among potential AGI deployers so they incorporate it as a factor, and frontload their decisionmaking between different hail marys (which would otherwise be done under high time pressure, producing an even worse decision).
I believe that this idea of identifying which Hail Mary strategies are particularly bad from an s-risk point of view seems like a good idea. I think that this may be a very important piece of work that should be done, assuming it has not been done already.
This seems like a good way to reduce S-risks, so I want to get this idea out there.
This is copied from the r/SufferingRisk subreddit here: https://www.reddit.com/r/SufferingRisk/wiki/intro/
As people get more desperate in attempting to prevent AGI x-risk, e.g. as AI progress draws closer & closer to AGI without satisfactory progress in alignment, the more reckless they will inevitably get in resorting to so-called “hail mary” and more “rushed” alignment techniques that carry a higher chance of s-risk. These are less careful and “principled”/formal theory based techniques (e.g. like MIRI’s Agent Foundations agenda) but more hasty last-ditch ideas that could have more unforeseen consequences or fail in nastier ways, including s-risks. This is a phenomenon we need to be highly vigilant in working to prevent. Otherwise, it’s virtually assured to happen; as, if faced with the arrival of AGI without yet having a good formal solution to alignment, most humans would likely choose a strategy that at least has a chance of working (trying a hail mary technique) instead of certain death (deploying their AGI without any alignment at all), despite the worse s-risk implications. To illustrate this, even Eliezer Yudkowsky, who wrote Separation from hyperexistential risk, has written this (due to his increased pessimism about alignment progress):
If even the originator of these ideas now has such a singleminded preoccupation with x-risk to the detriment of s-risk, how could we expect better from anyone else? Basically, in the face of imminent death, people will get desperate enough to do anything to prevent that, and suddenly s-risk considerations become a secondary afterthought (if they were even aware of s-risks at all). In their mad scramble to avert extinction risk, s-risks get trampled over, with potentially unthinkable results.
One possible idea to mitigate this risk would be, instead of trying to (perhaps unrealistically) prevent any AI development group worldwide from attempting hail mary type techniques in case the “mainline”/ideal alignment directions don’t bear fruit in time, we could try to hash out the different possible options in that class, analyze which ones have unacceptably high s-risk to definitely avoid, or less s-risk which may be preferable, and publicize this research in advance for anyone eventually in that position to consult. This would serve to at least raise awareness of s-risks among potential AGI deployers so they incorporate it as a factor, and frontload their decisionmaking between different hail marys (which would otherwise be done under high time pressure, producing an even worse decision).
I believe that this idea of identifying which Hail Mary strategies are particularly bad from an s-risk point of view seems like a good idea. I think that this may be a very important piece of work that should be done, assuming it has not been done already.