One of the reasons I wrote this post is that I don’t believe in regulation to solve this kind of problem (I’m still pro regulation). I believe that we need to get a common understanding of what are stupid things no one in their right mind would ever do (see my reply to jbash). To use your space colonization example: we certainly can’t regulate what people do somewhere in outer space. But if we survive long enough to get there, then we have either solved alignment or we have finally realized that it’s not possible, which will hopefully be common knowledge by then.
Let’s say someone finds a way to create a black hole, but there’s no way to contain it. Maybe it’s even relatively easy for some reason—say it costs 10 million dollars or so. It’s probably not possible to prevent everyone forever from creating one, but the best—IMO the only—option to prevent earth from getting destroyed immediately is to make it absolutely clear to everyone that creating a black hole is suicidal. There is no guarantee that this will hold forever, but given the facts (doable, uncontainable) it’s the only alternative that doesn’t involve killing everyone else or locking them up forever.
We may need to restrict access to computing power somehow until we solve alignment, so not every suicidal terrorist can easily create an AGI at some point. I don’t think we’ll have to go back to the 1970′s, though. Like I wrote, I think there’s a lot of potential with the AI we already have, and with narrow, but powerful future AIs.
the best—IMO the only—option to prevent earth from getting destroyed immediately is to make it absolutely clear to everyone that creating a black hole is suicidal.
I am pretty sure you are just wrong about this and there are people who will gladly pay $10M to end the world, or they would use it as a blackmail threat and a threat ignorer would meet a credible threatener, etc.
People are a bunch of unruly, dumb barely-evolved monkeys and the left tail of human stupidity and evil would blow your mind.
You may be right about that. Still, I don’t see any better alternative. We’re apes with too much power already, and we’re getting more powerful by the minute. Even without AGI, there are plenty of ways to end humanity (e.g. bioweapons, nanobots, nuclear war, bio lab accidents …) Either we learn to overcome our ape-brain impulses and restrict ourselves, or we’ll kill ourselves. As long as we haven’t killed ourselves, I’ll push towards the first option.
Well, yes, of course! Why didn’t I think of it myself? /s
Honestly, “aligned benevolent AI” is not a “better alternative” for the problem I’m writing about in this post, which is we’ll be able to develop an AGI before we have solved alignment. I’m totally fine with someone building an aligned AGI (assuming that it is really aligend, not just seemingly aligned). The problem is, this is very hard to do, and timelines are likely very short.
At least 2 options to develop aligned AGI, in the context of this discussion:
Slow down capabilities and speed up alignment just enough that we solve alignment before developing AGI
e.g. the MTAIR project, in this paper, models the effect of a fire alarm for HLMI as “extra time” as speeding up safety research, leading to a higher chance that it is successful before timeline for HLMI
this seems intuitively more feasible, hence more likely
Stop capabilities altogether—this is what you’re recommending in the OP
this seems intuitively far less feasible, hence ~unlikely (I interpret e.g. HarrisonDurland’s comment as elaborating on this intuition)
What I don’t yet understand is why you’re pushing for #2 over #1. You would probably more persuasive if you addressed e.g. why my intuition that #1 is more feasible than #2 is wrong.
Edited to add: Matthijs Maas’ Strategic Perspectives on Transformative AI Governance: Introduction has this (oversimplified) mapping of strategic perspectives. I think you’d probably fall under (technical: pessimistic or very; governance: very optimistic), while my sense is most LWers (me included) are either pessimistic or uncertain on both axes, so there’s that inferential gap to address in the OP.
I’m obviously all for “slowing down capabilites”. I’m not for “stopping capabilities altogether”, but for selecting which capabilites we want to develop, and which to avoid (e.g. strategic awareness). I’m totally for “solving alignment before AGI” if that’s possible.
I’m very pessimistic about technical alignment in the near term, but not “optimistic” about governance. “Death with dignity” is not really a strategy, though. If anything, my favorite strategy in the table is “improve competence, institutions, norms, trust, and tools, to set the stage for right decisions”: If we can create a common understanding that developing a misaligned AGI would be really stupid, maybe the people who have access to the necessary technology won’t do it, at least for a while.
The point of my post here is not to solve the whole problem. I just want to point out that the common “either AGI or bad future” is wrong.
Sure, I mostly agree. To repeat part of my earlier comment, you would probably more persuasive if you addressed e.g. why my intuition that #1 is more feasible than #2 is wrong. In other words, I’m giving you feedback on how to make your post more persuasive to the LW audience. This sort of response (“Well, yes, of course! Why didn’t I think of it myself? /s”) doesn’t really persuade readers; bridging inferential gaps would.
One of the reasons I wrote this post is that I don’t believe in regulation to solve this kind of problem (I’m still pro regulation). I believe that we need to get a common understanding of what are stupid things no one in their right mind would ever do (see my reply to jbash). To use your space colonization example: we certainly can’t regulate what people do somewhere in outer space. But if we survive long enough to get there, then we have either solved alignment or we have finally realized that it’s not possible, which will hopefully be common knowledge by then.
Let’s say someone finds a way to create a black hole, but there’s no way to contain it. Maybe it’s even relatively easy for some reason—say it costs 10 million dollars or so. It’s probably not possible to prevent everyone forever from creating one, but the best—IMO the only—option to prevent earth from getting destroyed immediately is to make it absolutely clear to everyone that creating a black hole is suicidal. There is no guarantee that this will hold forever, but given the facts (doable, uncontainable) it’s the only alternative that doesn’t involve killing everyone else or locking them up forever.
We may need to restrict access to computing power somehow until we solve alignment, so not every suicidal terrorist can easily create an AGI at some point. I don’t think we’ll have to go back to the 1970′s, though. Like I wrote, I think there’s a lot of potential with the AI we already have, and with narrow, but powerful future AIs.
I am pretty sure you are just wrong about this and there are people who will gladly pay $10M to end the world, or they would use it as a blackmail threat and a threat ignorer would meet a credible threatener, etc.
People are a bunch of unruly, dumb barely-evolved monkeys and the left tail of human stupidity and evil would blow your mind.
You may be right about that. Still, I don’t see any better alternative. We’re apes with too much power already, and we’re getting more powerful by the minute. Even without AGI, there are plenty of ways to end humanity (e.g. bioweapons, nanobots, nuclear war, bio lab accidents …) Either we learn to overcome our ape-brain impulses and restrict ourselves, or we’ll kill ourselves. As long as we haven’t killed ourselves, I’ll push towards the first option.
I do! Aligned benevolent AI!
Well, yes, of course! Why didn’t I think of it myself? /s
Honestly, “aligned benevolent AI” is not a “better alternative” for the problem I’m writing about in this post, which is we’ll be able to develop an AGI before we have solved alignment. I’m totally fine with someone building an aligned AGI (assuming that it is really aligend, not just seemingly aligned). The problem is, this is very hard to do, and timelines are likely very short.
At least 2 options to develop aligned AGI, in the context of this discussion:
Slow down capabilities and speed up alignment just enough that we solve alignment before developing AGI
e.g. the MTAIR project, in this paper, models the effect of a fire alarm for HLMI as “extra time” as speeding up safety research, leading to a higher chance that it is successful before timeline for HLMI
this seems intuitively more feasible, hence more likely
Stop capabilities altogether—this is what you’re recommending in the OP
this seems intuitively far less feasible, hence ~unlikely (I interpret e.g. HarrisonDurland’s comment as elaborating on this intuition)
What I don’t yet understand is why you’re pushing for #2 over #1. You would probably more persuasive if you addressed e.g. why my intuition that #1 is more feasible than #2 is wrong.
Edited to add: Matthijs Maas’ Strategic Perspectives on Transformative AI Governance: Introduction has this (oversimplified) mapping of strategic perspectives. I think you’d probably fall under (technical: pessimistic or very; governance: very optimistic), while my sense is most LWers (me included) are either pessimistic or uncertain on both axes, so there’s that inferential gap to address in the OP.
I’m obviously all for “slowing down capabilites”. I’m not for “stopping capabilities altogether”, but for selecting which capabilites we want to develop, and which to avoid (e.g. strategic awareness). I’m totally for “solving alignment before AGI” if that’s possible.
I’m very pessimistic about technical alignment in the near term, but not “optimistic” about governance. “Death with dignity” is not really a strategy, though. If anything, my favorite strategy in the table is “improve competence, institutions, norms, trust, and tools, to set the stage for right decisions”: If we can create a common understanding that developing a misaligned AGI would be really stupid, maybe the people who have access to the necessary technology won’t do it, at least for a while.
The point of my post here is not to solve the whole problem. I just want to point out that the common “either AGI or bad future” is wrong.
Sure, I mostly agree. To repeat part of my earlier comment, you would probably more persuasive if you addressed e.g. why my intuition that #1 is more feasible than #2 is wrong. In other words, I’m giving you feedback on how to make your post more persuasive to the LW audience. This sort of response (“Well, yes, of course! Why didn’t I think of it myself? /s”) doesn’t really persuade readers; bridging inferential gaps would.
Good point! Satirical reactions are not appropriate in comments, I apologize. However, I don’t think that arguing why alignment is difficult would fit into this post. I clearly stated this assumption in the introduction as a basis for my argument, assuming that LW readers were familiar with the problem. Here are some resources to explain why I don’t think that we can solve alignment in the next 5-10 years: https://intelligence.org/2016/12/28/ai-alignment-why-its-hard-and-where-to-start/, https://aisafety.info?state=6172_, https://www.lesswrong.com/s/TLSzP4xP42PPBctgw/p/3gAccKDW6nRKFumpP