The “survival bias” argument is overgeneralizing. For each technology mentioned and many others, the number of wrong ways to use/construct an implementation using the technology greatly exceeds the number of correct ways. We found the correct ways through systematic iteration.
As a simple example, fire has escaped engines many times, and caused all types of vehicles to burn. It took methodical and careful iteration to improve engines to the point that this usually doesn’t happen, and vehicles have fire suppression systems, firewalls, and many other design elements to deal with this expected risk. Note we do throw away performance, even combat jet fighters carry the extra weight of fire suppression systems.
Worst case you burn down Chicago.
Humans would be able to do the same for AI if humans are able to iterate on many possible constructions for an AGI, cleaning up the aftermath in the events where they find out about significant flaws late (which is why deception is such a problem). The “doom” argument is that an AGI can be made that has such an advantage it kills humans or disempowers humans before humans and other AI systems working for humans can react.
To support the doom argument you need to provide evidence for the main points:
(1) that humans can construct an ASI and provide the information necessary to train it that has a massive margin over human beings AND
(2) the ASI can run in useful timescales when performing at this level of cognition, inferencing on computational hardware humans can build AND
(3) whatever resources (robotics, physical tasks performed) the ASI can obtain, above the ones required for the ASI to merely exist and satisfy humans (essentially “profit”) are enough to kill/disempower humans AND
(4a) other AGI/ASI built by humans, and humans, are unable to stop it because they are less intelligent, despite potentially having a very large (orders of magnitude) advantage in resources and weapons OR
(4b) humans are scammed and support the ASI
If you wanted to argue against doom, or look for alignment ideas you can look for ways to limit each of these points. For example,
(1) does intelligence actually scale this way or is it diminishing returns? An accelerationist argument would point to the current data across many experiments saying it is in fact diminishing, or theoretical optimal policy arguments that prove it always has diminishing returns.
An alignment idea would be to subdivide AGI/ASI systems into smaller, better defined systems and you would not expect more than a slight performance penalty because of diminishing returns.
(2) This is a diminishing returns argument, you need logarithmically more compute to get linearly more intelligence. An accelerationist argument would count how many thousand H100s one running ‘instance’ of a strong ASI would likely need, and point out worldwide compute production won’t be enough for decades at the current ramp rate. (and a pro doom argument would point out production can be scaled up many OOM)
An alignment idea would be to register and track where high performance AI chips are purchased and deployed, limiting deployment to licensed data centers, and to audit datacenters to ensure all their loads are human customers and they are not harboring an escaped AGI.
( 3) An accelerationist would argue that humans can prevent doom with sparse systems that are tightly supervised, and an accelerationist argument would be that humans will do this naturally, it’s what EMH demands. (competing AGI in a free market will not have any spare capacity to kill humans, as they are too busy spending all their resources trying to make money)
This is sparse/myopia/ “deception? I ain’t got time for that”
(4a) An accelerationist would argue that humans should race to build many kinds of powerful but restricted AGI/ASI as rapidly as possible, so they can stop AGI doom, by having a large stockpile of weapons and capabilities.
Note that this is what every alignment lab ends up doing. I have talked to one person who suggested they should develop combat drones that can be mass produced as an alignment strategy, aka an offensive defense against hostile AGI/ASI by having the capability to deploy very large numbers of essentially smart bombs. So long as humans retain control, this might be a viable idea…
(4b) This is an argument that we’re screwed and deserve to die, an accelerationist argument would be that if humans are this stupid they deserve to die.
I’m not sure how we fight this, this is my biggest fear. That we can win on a technical level and it’s possible to win without needing unrealistic international cooperation, but we die because we got scammed. Lightly touching politics: this seems to be an entirely plausible risk. There are many examples of democracies failing and picking obviously self-interested leaders who are obviously unqualified for the role.
Summary: the accelerationist arguments made in this debate are weak. I pointed out some stronger ones.
Hi Gerald, thanks for your comment! Note that I am arguing neither in favour of or against doom. What I am arguing though is the following: it is not good practice to group AI with technologies that we were able to iteratively improve towards safety when you are trying to prove AI safety. The point here is that without further arguments, you could easily make the reverse argument and it would have roughly the equal force:
P1 Many new technologies are often unsafe and impossible to iteratively improve (e.g. airhips).
P2 AI is a new technology.
C1 AI is probably unsafe and impossible to iteratively improve.
That is why I argue that this is not a good argument template because through survivorship bias in P1, you‘ll always be able to sneak in whatever it is you’re trying to prove.
With respect to your arguments about doom scenarios, I think they are really interesting and I’d be excited to read a post with your thoughts (maybe you already have one?).
The “survival bias” argument is overgeneralizing. For each technology mentioned and many others, the number of wrong ways to use/construct an implementation using the technology greatly exceeds the number of correct ways. We found the correct ways through systematic iteration.
As a simple example, fire has escaped engines many times, and caused all types of vehicles to burn. It took methodical and careful iteration to improve engines to the point that this usually doesn’t happen, and vehicles have fire suppression systems, firewalls, and many other design elements to deal with this expected risk. Note we do throw away performance, even combat jet fighters carry the extra weight of fire suppression systems.
Worst case you burn down Chicago.
Humans would be able to do the same for AI if humans are able to iterate on many possible constructions for an AGI, cleaning up the aftermath in the events where they find out about significant flaws late (which is why deception is such a problem). The “doom” argument is that an AGI can be made that has such an advantage it kills humans or disempowers humans before humans and other AI systems working for humans can react.
To support the doom argument you need to provide evidence for the main points:
(1) that humans can construct an ASI and provide the information necessary to train it that has a massive margin over human beings AND
(2) the ASI can run in useful timescales when performing at this level of cognition, inferencing on computational hardware humans can build AND
(3) whatever resources (robotics, physical tasks performed) the ASI can obtain, above the ones required for the ASI to merely exist and satisfy humans (essentially “profit”) are enough to kill/disempower humans AND
(4a) other AGI/ASI built by humans, and humans, are unable to stop it because they are less intelligent, despite potentially having a very large (orders of magnitude) advantage in resources and weapons OR
(4b) humans are scammed and support the ASI
If you wanted to argue against doom, or look for alignment ideas you can look for ways to limit each of these points. For example,
(1) does intelligence actually scale this way or is it diminishing returns? An accelerationist argument would point to the current data across many experiments saying it is in fact diminishing, or theoretical optimal policy arguments that prove it always has diminishing returns.
An alignment idea would be to subdivide AGI/ASI systems into smaller, better defined systems and you would not expect more than a slight performance penalty because of diminishing returns.
(2) This is a diminishing returns argument, you need logarithmically more compute to get linearly more intelligence. An accelerationist argument would count how many thousand H100s one running ‘instance’ of a strong ASI would likely need, and point out worldwide compute production won’t be enough for decades at the current ramp rate. (and a pro doom argument would point out production can be scaled up many OOM)
An alignment idea would be to register and track where high performance AI chips are purchased and deployed, limiting deployment to licensed data centers, and to audit datacenters to ensure all their loads are human customers and they are not harboring an escaped AGI.
( 3) An accelerationist would argue that humans can prevent doom with sparse systems that are tightly supervised, and an accelerationist argument would be that humans will do this naturally, it’s what EMH demands. (competing AGI in a free market will not have any spare capacity to kill humans, as they are too busy spending all their resources trying to make money)
This is sparse/myopia/ “deception? I ain’t got time for that”
(4a) An accelerationist would argue that humans should race to build many kinds of powerful but restricted AGI/ASI as rapidly as possible, so they can stop AGI doom, by having a large stockpile of weapons and capabilities.
Note that this is what every alignment lab ends up doing. I have talked to one person who suggested they should develop combat drones that can be mass produced as an alignment strategy, aka an offensive defense against hostile AGI/ASI by having the capability to deploy very large numbers of essentially smart bombs. So long as humans retain control, this might be a viable idea…
(4b) This is an argument that we’re screwed and deserve to die, an accelerationist argument would be that if humans are this stupid they deserve to die.
I’m not sure how we fight this, this is my biggest fear. That we can win on a technical level and it’s possible to win without needing unrealistic international cooperation, but we die because we got scammed. Lightly touching politics: this seems to be an entirely plausible risk. There are many examples of democracies failing and picking obviously self-interested leaders who are obviously unqualified for the role.
Summary: the accelerationist arguments made in this debate are weak. I pointed out some stronger ones.
Hi Gerald, thanks for your comment! Note that I am arguing neither in favour of or against doom. What I am arguing though is the following: it is not good practice to group AI with technologies that we were able to iteratively improve towards safety when you are trying to prove AI safety. The point here is that without further arguments, you could easily make the reverse argument and it would have roughly the equal force:
P1 Many new technologies are often unsafe and impossible to iteratively improve (e.g. airhips).
P2 AI is a new technology.
C1 AI is probably unsafe and impossible to iteratively improve.
That is why I argue that this is not a good argument template because through survivorship bias in P1, you‘ll always be able to sneak in whatever it is you’re trying to prove.
With respect to your arguments about doom scenarios, I think they are really interesting and I’d be excited to read a post with your thoughts (maybe you already have one?).