It seems like the solution space to the existential threat of AGI can be described as follows:
Solutions which convey a credible threat* to all AGI that we will make it physically impossible** for them to either achieve X desirable outcome and/or prevent Y undesirable outcome where the value of X or cost of Y exponentially exceeds the value obtained by eradicating humanity, if they decide to eradicate humanity, such that even a small chance of the threat materializing makes eradication a poor option***.
*Probably backed by a construction of some kind (e.g. Establishing, on several distant planets, a latent system that we’ll call AGI 2 which, if activated through the detection of machine extermination of humans, will be simultaneously green-lit to climb the intelligence ladder and be made aware of AGI 1, which evidently cannot co-habitate with other intelligences and would seek to exterminate AGI 2 as well. The hope would be that AGI 2 would then attack AGI 1. A third but lesser intelligence would automatically, at a later date that is set in stone, notify AGI 1 of AGI 2′s existence and simultaneously notify AGI 2 of this fact, such that, if until that point AGI 2 has climbed the intelligence ladder but not attacked AGI 1, it will be forced to attack AGI 1 in order to self-preserve. Humans would notify AGIs of the construction of this system as part of a credible threat, in such a way that AGI 1 cannot disable the system to remove the threat with 100% assurance that it has fully succeeded in doing so). Note: In parenthesis was just one fantastical example, and it could easily have been about a system on Earth or in our solar system that would achieve the same level of credible threat in a more practical or reliable fashion, albeit with more proximate exposure to AGI 1.
**As in, they would need to violate the laws of physics to do so (e.g. travel faster than the speed of light)
***Assuming eradication of humans would be orders of magnitude less valuable to an AGI than its most desired outcome, whatever that may be. If we can’t confidently predict anything that the AGI will desire that much, we can still assume that AGI 1 is at least significantly interested in self-preservation so as to make the threat of undesirable outcome Y, which includes an attack by AGI 2, a viable option.
I am not a computer scientist or AI researcher, so I say the above assuming that there is a great probability of a major gap in my postulation. I am equally interested in the details of why I am partially/fully wrong than I am interested in being correct. Thanks in advance to anyone who responds.
It seems like the solution space to the existential threat of AGI can be described as follows:
Solutions which convey a credible threat* to all AGI that we will make it physically impossible** for them to either achieve X desirable outcome and/or prevent Y undesirable outcome where the value of X or cost of Y exponentially exceeds the value obtained by eradicating humanity, if they decide to eradicate humanity, such that even a small chance of the threat materializing makes eradication a poor option***.
*Probably backed by a construction of some kind (e.g. Establishing, on several distant planets, a latent system that we’ll call AGI 2 which, if activated through the detection of machine extermination of humans, will be simultaneously green-lit to climb the intelligence ladder and be made aware of AGI 1, which evidently cannot co-habitate with other intelligences and would seek to exterminate AGI 2 as well. The hope would be that AGI 2 would then attack AGI 1. A third but lesser intelligence would automatically, at a later date that is set in stone, notify AGI 1 of AGI 2′s existence and simultaneously notify AGI 2 of this fact, such that, if until that point AGI 2 has climbed the intelligence ladder but not attacked AGI 1, it will be forced to attack AGI 1 in order to self-preserve. Humans would notify AGIs of the construction of this system as part of a credible threat, in such a way that AGI 1 cannot disable the system to remove the threat with 100% assurance that it has fully succeeded in doing so). Note: In parenthesis was just one fantastical example, and it could easily have been about a system on Earth or in our solar system that would achieve the same level of credible threat in a more practical or reliable fashion, albeit with more proximate exposure to AGI 1.
**As in, they would need to violate the laws of physics to do so (e.g. travel faster than the speed of light)
***Assuming eradication of humans would be orders of magnitude less valuable to an AGI than its most desired outcome, whatever that may be. If we can’t confidently predict anything that the AGI will desire that much, we can still assume that AGI 1 is at least significantly interested in self-preservation so as to make the threat of undesirable outcome Y, which includes an attack by AGI 2, a viable option.
I am not a computer scientist or AI researcher, so I say the above assuming that there is a great probability of a major gap in my postulation. I am equally interested in the details of why I am partially/fully wrong than I am interested in being correct. Thanks in advance to anyone who responds.