I think many-people-can-build-AGI scenarios are unlikely because before they happen, we’ll be in a situation where a-couple-people-can-build-AGI, and probably someone will at that point. And once there is at least one AGI running around, things will either get a lot worse or a lot better very quickly.
I think many-people-can-build-AGI scenarios are still likely enough to be worth thinking about, though, because they could happen if there is a huge amount of hardware overhang (and insufficient secrecy about AGI-building techniques) or if there is a successful-for-some-time policy effort to ban or restrict AGI research.
I think the second scenario you bring up is also interesting. It’s sorta a rejection of my “things will either get a lot worse or a lot better very quickly” claim above. I think it is also plausible enough to think more about.
Hmm, I find it plausible that we will have that on average p(build unaligned AGI | can build unaligned AGI) is about 0.01, which implies that unaligned AGI is built when there are ~100 actors that can build AGI, which seems to fit many-people-can-build-AGI.
The 0.01 probability could happen because of regulations / laws, as you mention, but also if the world has sufficient common knowledge of the risks of unaligned AGI (which seems not implausible to me, perhaps because of warning shots, or because of our research, or because of natural human risk aversion).
I guess you are more optimistic than me about humanity. :) I hope you are right!
Good point about the warning shots leading to common knowledge thing. I am pessimistic that mere argumentation and awareness-raising will be able to achieve an effect that large, but combined with a warning shot it might.
But I am skeptical that we’ll get sufficiently severe warning shots. I think that by the time AGI gets smart enough to cause serious damage, it’ll also be smart enough to guess that humans would punish it for doing so, and that it would be better off biding its time.
I guess you are more optimistic than me about humanity. :) I hope you are right!
Out of the two people I’ve talked to who considered building AGI an important goal of theirs, one said “It’s morally good for AGI to increase complexity in the universe,” and the other said, “Trust me, I’m prepared to walk over bodies to build this thing.”
Probably those weren’t representative, but this “2 in 2” experience does make me skeptical about “1 in 100″ figure.
(And those strange motivations I encountered weren’t even factoring in doing the wrong thing by accident – which seems even more common/likely to me.)
I think some people are temperamentally incapable of being appropriately cynical about the way things are, so I find it hard to decide if non-pessimistic AGI researchers (of which there are admittedly many within EA) happen to be like that, or whether they accurately judge that people at the frontier of AGI research are unusually sane and cautious.
And once there is at least one AGI running around, things will either get a lot worse or a lot better very quickly.
I don’t expect the first AGI to have that much influence (assuming gradual progress). Here’s an example of what fits my model: there is one giant-research-project AGI that costs $10b to deploy (and maybe $100b to R&D), 100 slightly worse pre-AGIs that cost perhaps $100m each to deploy, and 1m again slightly worse pre-AGIs that cost $10k to each copy. So at any point in time we have a lot of AI systems that, together, are more powerful than the small number of most impressive systems.
This reasoning can break if deployment turns out to be very cheap (i.e. low marginal cost compared to fixed cost); then there will be lots of copies of the most impressive system. Then it matters a lot who uses the copies. Are they kept secret and only deployed for internal use? Or are they sold in some form? (E.g. the supplier sells access to its system so customers can fine-tune e.g. to do financial trading.)
I think AGIs which are copies of each other—even AGIs which are built using the same training method—are likely to coordinate very well with each other even if they are not given information about each other’s existence. Basically, they’ll act like one agent, as far as deception and treacherous turns and decisive strategic advantage are concerned.
EDIT: Also, I suspect this coordination might extend further, to AGIs with different architectures also. Thus even the third-tier $10K AGIs might effectively act as co-conspirators with the latest model, and/or vice versa.
Also, I suspect this coordination might extend further, to AGIs with different architectures also.
Why would you suppose that? The design space of AI is incredibly large and humans are clear counter-examples, so the question one ought to ask is: Is there any fundamental reason an AGI that refuses to coordinate will inevitably fall off the AI risk landscape?
I agree that coordination between mutually aligned AIs is plausible.
I think such coordination is less likely in our example because we can probably anticipate and avoid it for human-level AGI.
I also think there are strong commercial incentives to avoid building mutually aligned AGIs. You can’t sell (access to) a system if there is no reason to believe the system will help your customer. Rather, I expect systems to be fine-tuned for each task, as in the current paradigm. (The systems may successfully resist fine-tuning once they become sufficiently advanced.)
I’ll also add that two copies of the same system are not necessarily mutually aligned. See for example debate and other self-play algorithms.
I agree about the strong commercial incentives, but I don’t think we will be in a context where people will follow their incentives. After all, there are incredibly strong incentives not to make AGI at all until you can be very confident it is perfectly safe—strong enough that it’s probably not a good idea to pursue AI research at all until AI safety research is much more well-established than it is today—and yet here we are.
Basically, people won’t recognize their incentives, because people won’t realize how much danger they are in.
Hmm, in my model most of the x-risk is gone if there is no incentive to deploy. But I expect actors will deploy systems because their system is aligned with a proxy. At least this leads to short-term gains. Maybe the crux is that you expect these actors to suffer a large private harm (death) and I expect a small private harm (for each system, a marginal distributed harm to all of society)?
I put non-trivial probability mass (>10%) on a relitivisticly expanding bubble of Xonium (computronium, hedonium ect) within 1 second of AGI.
While big jumps are rarer than small jumps, they cover more distance, so it is quite possible we go from a world like this one, except with self driving cars, and a few other narrow AI applications to something smart enough to bootstrap very fast.
One second is preposterous! It’d take at least a minute to get up to relativistic speeds; keep in mind it’ll have to build infrastructure as it goes along, and it’ll start off using human-built tools which aren’t capable of such speeds. No way it can build such powerful tools with human tools in the space of a second.
I’d be surprised if it managed to convert the surface of the planet in less than 10 minutes, to be honest. It might get to the moon in an hour, and have crippled our ability to fight back within 20 seconds, but it’s just intelligent; not magical. Getting to relativistic speeds still requires energy, and Xonium still needs to be made of something.
I think many-people-can-build-AGI scenarios are unlikely because before they happen, we’ll be in a situation where a-couple-people-can-build-AGI, and probably someone will at that point. And once there is at least one AGI running around, things will either get a lot worse or a lot better very quickly.
I think many-people-can-build-AGI scenarios are still likely enough to be worth thinking about, though, because they could happen if there is a huge amount of hardware overhang (and insufficient secrecy about AGI-building techniques) or if there is a successful-for-some-time policy effort to ban or restrict AGI research.
I think the second scenario you bring up is also interesting. It’s sorta a rejection of my “things will either get a lot worse or a lot better very quickly” claim above. I think it is also plausible enough to think more about.
Hmm, I find it plausible that we will have that on average p(build unaligned AGI | can build unaligned AGI) is about 0.01, which implies that unaligned AGI is built when there are ~100 actors that can build AGI, which seems to fit many-people-can-build-AGI.
The 0.01 probability could happen because of regulations / laws, as you mention, but also if the world has sufficient common knowledge of the risks of unaligned AGI (which seems not implausible to me, perhaps because of warning shots, or because of our research, or because of natural human risk aversion).
I guess you are more optimistic than me about humanity. :) I hope you are right!
Good point about the warning shots leading to common knowledge thing. I am pessimistic that mere argumentation and awareness-raising will be able to achieve an effect that large, but combined with a warning shot it might.
But I am skeptical that we’ll get sufficiently severe warning shots. I think that by the time AGI gets smart enough to cause serious damage, it’ll also be smart enough to guess that humans would punish it for doing so, and that it would be better off biding its time.
Out of the two people I’ve talked to who considered building AGI an important goal of theirs, one said “It’s morally good for AGI to increase complexity in the universe,” and the other said, “Trust me, I’m prepared to walk over bodies to build this thing.”
Probably those weren’t representative, but this “2 in 2” experience does make me skeptical about “1 in 100″ figure.
(And those strange motivations I encountered weren’t even factoring in doing the wrong thing by accident – which seems even more common/likely to me.)
I think some people are temperamentally incapable of being appropriately cynical about the way things are, so I find it hard to decide if non-pessimistic AGI researchers (of which there are admittedly many within EA) happen to be like that, or whether they accurately judge that people at the frontier of AGI research are unusually sane and cautious.
I don’t expect the first AGI to have that much influence (assuming gradual progress). Here’s an example of what fits my model: there is one giant-research-project AGI that costs $10b to deploy (and maybe $100b to R&D), 100 slightly worse pre-AGIs that cost perhaps $100m each to deploy, and 1m again slightly worse pre-AGIs that cost $10k to each copy. So at any point in time we have a lot of AI systems that, together, are more powerful than the small number of most impressive systems.
This reasoning can break if deployment turns out to be very cheap (i.e. low marginal cost compared to fixed cost); then there will be lots of copies of the most impressive system. Then it matters a lot who uses the copies. Are they kept secret and only deployed for internal use? Or are they sold in some form? (E.g. the supplier sells access to its system so customers can fine-tune e.g. to do financial trading.)
I think AGIs which are copies of each other—even AGIs which are built using the same training method—are likely to coordinate very well with each other even if they are not given information about each other’s existence. Basically, they’ll act like one agent, as far as deception and treacherous turns and decisive strategic advantage are concerned.
EDIT: Also, I suspect this coordination might extend further, to AGIs with different architectures also. Thus even the third-tier $10K AGIs might effectively act as co-conspirators with the latest model, and/or vice versa.
Why would you suppose that? The design space of AI is incredibly large and humans are clear counter-examples, so the question one ought to ask is: Is there any fundamental reason an AGI that refuses to coordinate will inevitably fall off the AI risk landscape?
I agree that coordination between mutually aligned AIs is plausible.
I think such coordination is less likely in our example because we can probably anticipate and avoid it for human-level AGI.
I also think there are strong commercial incentives to avoid building mutually aligned AGIs. You can’t sell (access to) a system if there is no reason to believe the system will help your customer. Rather, I expect systems to be fine-tuned for each task, as in the current paradigm. (The systems may successfully resist fine-tuning once they become sufficiently advanced.)
I’ll also add that two copies of the same system are not necessarily mutually aligned. See for example debate and other self-play algorithms.
I agree about the strong commercial incentives, but I don’t think we will be in a context where people will follow their incentives. After all, there are incredibly strong incentives not to make AGI at all until you can be very confident it is perfectly safe—strong enough that it’s probably not a good idea to pursue AI research at all until AI safety research is much more well-established than it is today—and yet here we are.
Basically, people won’t recognize their incentives, because people won’t realize how much danger they are in.
Hmm, in my model most of the x-risk is gone if there is no incentive to deploy. But I expect actors will deploy systems because their system is aligned with a proxy. At least this leads to short-term gains. Maybe the crux is that you expect these actors to suffer a large private harm (death) and I expect a small private harm (for each system, a marginal distributed harm to all of society)?
It makes no difference if the marginal distributed harm to all of society is so overwhelmingly large that your share of it is still death.
I’m using the colloquial meaning of ‘marginal’ = ‘not large’.
I put non-trivial probability mass (>10%) on a relitivisticly expanding bubble of Xonium (computronium, hedonium ect) within 1 second of AGI.
While big jumps are rarer than small jumps, they cover more distance, so it is quite possible we go from a world like this one, except with self driving cars, and a few other narrow AI applications to something smart enough to bootstrap very fast.
One second is preposterous! It’d take at least a minute to get up to relativistic speeds; keep in mind it’ll have to build infrastructure as it goes along, and it’ll start off using human-built tools which aren’t capable of such speeds. No way it can build such powerful tools with human tools in the space of a second.
I’d be surprised if it managed to convert the surface of the planet in less than 10 minutes, to be honest. It might get to the moon in an hour, and have crippled our ability to fight back within 20 seconds, but it’s just intelligent; not magical. Getting to relativistic speeds still requires energy, and Xonium still needs to be made of something.
The actual bootstrapping takes months, years or even decades, but it might only take 1 second for the fate of the universe to be locked in.