If you really think that through in the long-term it means a permanent ban on compute that takes computers back to the level they were in the 1970s, and a global enforcement system to keep it that way.
Furthermore, there are implications for space colonization: if any group can blast away from the SOL system and colonize a different solar system, they can make an ASI there and it can come back and kill us all. Similarly, it means any development on other planets must be monitored for violations of the compute limit, and we must also monitor technology that would indirectly lead to a violation of the limits on compute, i.e. any technology that would allow people to build an advanced modern chip fab without the enforcers detecting it.
In short, you are trying to impose an equilibrium which is not incentive compatible—every agent with sufficient power to build ASI has an incentive to do so for purely instrumental reasons. So, long-term the only way to not build misaligned, dangerous ASI is build an aligned ASI.
However that is a long-term problem. In the short term there are a very limited number of groups who can build it so they can probably coordinate.
We decided to restrict nuclear power to the point where it’s rare in order to prevent nuclear proliferation. We decided to ban biological weapons, almost fully successfully. We can ban things that have strong local incentives, and I think that ignoring that, and claiming that slowing down or stopping can’t happen, is giving up on perhaps the most promising avenue for reducing existential risk from AI. (And this view helps in accelerating race dynamics, so even if I didn’t think it was substantively wrong, I’d be confused as to why it’s useful to actively promote it as an idea.)
No, it was and is a global treaty enforced multilaterally, as well as a number of bans on testing and arms reduction treaties. For each, there is a strong local incentive for states—including the US—to defect, but the existence of a treaty allows global cooperation.
With AGI, of course, we have strong reasons to think that the payoff matrix looks something like the following:
(0,0) (-∞, 5-∞) (5-∞, -∞) (-∞, -∞)
So yes, there’s a local incentive to defect, but it’s actually a prisoner’s dilemma where the best case for defecting is identical to suicide.
One of the reasons I wrote this post is that I don’t believe in regulation to solve this kind of problem (I’m still pro regulation). I believe that we need to get a common understanding of what are stupid things no one in their right mind would ever do (see my reply to jbash). To use your space colonization example: we certainly can’t regulate what people do somewhere in outer space. But if we survive long enough to get there, then we have either solved alignment or we have finally realized that it’s not possible, which will hopefully be common knowledge by then.
Let’s say someone finds a way to create a black hole, but there’s no way to contain it. Maybe it’s even relatively easy for some reason—say it costs 10 million dollars or so. It’s probably not possible to prevent everyone forever from creating one, but the best—IMO the only—option to prevent earth from getting destroyed immediately is to make it absolutely clear to everyone that creating a black hole is suicidal. There is no guarantee that this will hold forever, but given the facts (doable, uncontainable) it’s the only alternative that doesn’t involve killing everyone else or locking them up forever.
We may need to restrict access to computing power somehow until we solve alignment, so not every suicidal terrorist can easily create an AGI at some point. I don’t think we’ll have to go back to the 1970′s, though. Like I wrote, I think there’s a lot of potential with the AI we already have, and with narrow, but powerful future AIs.
the best—IMO the only—option to prevent earth from getting destroyed immediately is to make it absolutely clear to everyone that creating a black hole is suicidal.
I am pretty sure you are just wrong about this and there are people who will gladly pay $10M to end the world, or they would use it as a blackmail threat and a threat ignorer would meet a credible threatener, etc.
People are a bunch of unruly, dumb barely-evolved monkeys and the left tail of human stupidity and evil would blow your mind.
You may be right about that. Still, I don’t see any better alternative. We’re apes with too much power already, and we’re getting more powerful by the minute. Even without AGI, there are plenty of ways to end humanity (e.g. bioweapons, nanobots, nuclear war, bio lab accidents …) Either we learn to overcome our ape-brain impulses and restrict ourselves, or we’ll kill ourselves. As long as we haven’t killed ourselves, I’ll push towards the first option.
Well, yes, of course! Why didn’t I think of it myself? /s
Honestly, “aligned benevolent AI” is not a “better alternative” for the problem I’m writing about in this post, which is we’ll be able to develop an AGI before we have solved alignment. I’m totally fine with someone building an aligned AGI (assuming that it is really aligend, not just seemingly aligned). The problem is, this is very hard to do, and timelines are likely very short.
At least 2 options to develop aligned AGI, in the context of this discussion:
Slow down capabilities and speed up alignment just enough that we solve alignment before developing AGI
e.g. the MTAIR project, in this paper, models the effect of a fire alarm for HLMI as “extra time” as speeding up safety research, leading to a higher chance that it is successful before timeline for HLMI
this seems intuitively more feasible, hence more likely
Stop capabilities altogether—this is what you’re recommending in the OP
this seems intuitively far less feasible, hence ~unlikely (I interpret e.g. HarrisonDurland’s comment as elaborating on this intuition)
What I don’t yet understand is why you’re pushing for #2 over #1. You would probably more persuasive if you addressed e.g. why my intuition that #1 is more feasible than #2 is wrong.
Edited to add: Matthijs Maas’ Strategic Perspectives on Transformative AI Governance: Introduction has this (oversimplified) mapping of strategic perspectives. I think you’d probably fall under (technical: pessimistic or very; governance: very optimistic), while my sense is most LWers (me included) are either pessimistic or uncertain on both axes, so there’s that inferential gap to address in the OP.
I’m obviously all for “slowing down capabilites”. I’m not for “stopping capabilities altogether”, but for selecting which capabilites we want to develop, and which to avoid (e.g. strategic awareness). I’m totally for “solving alignment before AGI” if that’s possible.
I’m very pessimistic about technical alignment in the near term, but not “optimistic” about governance. “Death with dignity” is not really a strategy, though. If anything, my favorite strategy in the table is “improve competence, institutions, norms, trust, and tools, to set the stage for right decisions”: If we can create a common understanding that developing a misaligned AGI would be really stupid, maybe the people who have access to the necessary technology won’t do it, at least for a while.
The point of my post here is not to solve the whole problem. I just want to point out that the common “either AGI or bad future” is wrong.
Sure, I mostly agree. To repeat part of my earlier comment, you would probably more persuasive if you addressed e.g. why my intuition that #1 is more feasible than #2 is wrong. In other words, I’m giving you feedback on how to make your post more persuasive to the LW audience. This sort of response (“Well, yes, of course! Why didn’t I think of it myself? /s”) doesn’t really persuade readers; bridging inferential gaps would.
If you really think that through in the long-term it means a permanent ban on compute that takes computers back to the level they were in the 1970s, and a global enforcement system to keep it that way.
Furthermore, there are implications for space colonization: if any group can blast away from the SOL system and colonize a different solar system, they can make an ASI there and it can come back and kill us all. Similarly, it means any development on other planets must be monitored for violations of the compute limit, and we must also monitor technology that would indirectly lead to a violation of the limits on compute, i.e. any technology that would allow people to build an advanced modern chip fab without the enforcers detecting it.
In short, you are trying to impose an equilibrium which is not incentive compatible—every agent with sufficient power to build ASI has an incentive to do so for purely instrumental reasons. So, long-term the only way to not build misaligned, dangerous ASI is build an aligned ASI.
However that is a long-term problem. In the short term there are a very limited number of groups who can build it so they can probably coordinate.
We decided to restrict nuclear power to the point where it’s rare in order to prevent nuclear proliferation. We decided to ban biological weapons, almost fully successfully. We can ban things that have strong local incentives, and I think that ignoring that, and claiming that slowing down or stopping can’t happen, is giving up on perhaps the most promising avenue for reducing existential risk from AI. (And this view helps in accelerating race dynamics, so even if I didn’t think it was substantively wrong, I’d be confused as to why it’s useful to actively promote it as an idea.)
This is enforced by the USA though, and the USA is a nuclear power with global reach.
No, it was and is a global treaty enforced multilaterally, as well as a number of bans on testing and arms reduction treaties. For each, there is a strong local incentive for states—including the US—to defect, but the existence of a treaty allows global cooperation.
With AGI, of course, we have strong reasons to think that the payoff matrix looks something like the following:
(0,0) (-∞, 5-∞)
(5-∞, -∞) (-∞, -∞)
So yes, there’s a local incentive to defect, but it’s actually a prisoner’s dilemma where the best case for defecting is identical to suicide.
One of the reasons I wrote this post is that I don’t believe in regulation to solve this kind of problem (I’m still pro regulation). I believe that we need to get a common understanding of what are stupid things no one in their right mind would ever do (see my reply to jbash). To use your space colonization example: we certainly can’t regulate what people do somewhere in outer space. But if we survive long enough to get there, then we have either solved alignment or we have finally realized that it’s not possible, which will hopefully be common knowledge by then.
Let’s say someone finds a way to create a black hole, but there’s no way to contain it. Maybe it’s even relatively easy for some reason—say it costs 10 million dollars or so. It’s probably not possible to prevent everyone forever from creating one, but the best—IMO the only—option to prevent earth from getting destroyed immediately is to make it absolutely clear to everyone that creating a black hole is suicidal. There is no guarantee that this will hold forever, but given the facts (doable, uncontainable) it’s the only alternative that doesn’t involve killing everyone else or locking them up forever.
We may need to restrict access to computing power somehow until we solve alignment, so not every suicidal terrorist can easily create an AGI at some point. I don’t think we’ll have to go back to the 1970′s, though. Like I wrote, I think there’s a lot of potential with the AI we already have, and with narrow, but powerful future AIs.
I am pretty sure you are just wrong about this and there are people who will gladly pay $10M to end the world, or they would use it as a blackmail threat and a threat ignorer would meet a credible threatener, etc.
People are a bunch of unruly, dumb barely-evolved monkeys and the left tail of human stupidity and evil would blow your mind.
You may be right about that. Still, I don’t see any better alternative. We’re apes with too much power already, and we’re getting more powerful by the minute. Even without AGI, there are plenty of ways to end humanity (e.g. bioweapons, nanobots, nuclear war, bio lab accidents …) Either we learn to overcome our ape-brain impulses and restrict ourselves, or we’ll kill ourselves. As long as we haven’t killed ourselves, I’ll push towards the first option.
I do! Aligned benevolent AI!
Well, yes, of course! Why didn’t I think of it myself? /s
Honestly, “aligned benevolent AI” is not a “better alternative” for the problem I’m writing about in this post, which is we’ll be able to develop an AGI before we have solved alignment. I’m totally fine with someone building an aligned AGI (assuming that it is really aligend, not just seemingly aligned). The problem is, this is very hard to do, and timelines are likely very short.
At least 2 options to develop aligned AGI, in the context of this discussion:
Slow down capabilities and speed up alignment just enough that we solve alignment before developing AGI
e.g. the MTAIR project, in this paper, models the effect of a fire alarm for HLMI as “extra time” as speeding up safety research, leading to a higher chance that it is successful before timeline for HLMI
this seems intuitively more feasible, hence more likely
Stop capabilities altogether—this is what you’re recommending in the OP
this seems intuitively far less feasible, hence ~unlikely (I interpret e.g. HarrisonDurland’s comment as elaborating on this intuition)
What I don’t yet understand is why you’re pushing for #2 over #1. You would probably more persuasive if you addressed e.g. why my intuition that #1 is more feasible than #2 is wrong.
Edited to add: Matthijs Maas’ Strategic Perspectives on Transformative AI Governance: Introduction has this (oversimplified) mapping of strategic perspectives. I think you’d probably fall under (technical: pessimistic or very; governance: very optimistic), while my sense is most LWers (me included) are either pessimistic or uncertain on both axes, so there’s that inferential gap to address in the OP.
I’m obviously all for “slowing down capabilites”. I’m not for “stopping capabilities altogether”, but for selecting which capabilites we want to develop, and which to avoid (e.g. strategic awareness). I’m totally for “solving alignment before AGI” if that’s possible.
I’m very pessimistic about technical alignment in the near term, but not “optimistic” about governance. “Death with dignity” is not really a strategy, though. If anything, my favorite strategy in the table is “improve competence, institutions, norms, trust, and tools, to set the stage for right decisions”: If we can create a common understanding that developing a misaligned AGI would be really stupid, maybe the people who have access to the necessary technology won’t do it, at least for a while.
The point of my post here is not to solve the whole problem. I just want to point out that the common “either AGI or bad future” is wrong.
Sure, I mostly agree. To repeat part of my earlier comment, you would probably more persuasive if you addressed e.g. why my intuition that #1 is more feasible than #2 is wrong. In other words, I’m giving you feedback on how to make your post more persuasive to the LW audience. This sort of response (“Well, yes, of course! Why didn’t I think of it myself? /s”) doesn’t really persuade readers; bridging inferential gaps would.
Good point! Satirical reactions are not appropriate in comments, I apologize. However, I don’t think that arguing why alignment is difficult would fit into this post. I clearly stated this assumption in the introduction as a basis for my argument, assuming that LW readers were familiar with the problem. Here are some resources to explain why I don’t think that we can solve alignment in the next 5-10 years: https://intelligence.org/2016/12/28/ai-alignment-why-its-hard-and-where-to-start/, https://aisafety.info?state=6172_, https://www.lesswrong.com/s/TLSzP4xP42PPBctgw/p/3gAccKDW6nRKFumpP