Nice! Thanks! I’ll give my commentary on your commentary, also point by point. Your stuff italicized, my stuff not. Warning: Wall of text incoming! :)
I think raw intelligence, while important, is not the primary factor that explains why humanity-as-a-species is much more powerful than chimpanzees-as-a-species. Notably, humans were once much less powerful, in our hunter-gatherer days, but over time, through the gradual process of accumulating technology, knowledge, and culture, humans now possess vast productive capacities that far outstrip our ancient powers.
Similarly, our ability to coordinate through language also plays a huge role in explaining our power compared to other animals. But, on a first approximation, other animals can’t coordinate at all, making this distinction much less impressive. The first AGIs we construct will be born into a culture already capable of coordinating, and sharing knowledge, making the potential power difference between AGI and humans relatively much smaller than between humans and other animals, at least at first.
I don’t think I understand this argument. Yes, humans can use language to coordinate & benefit from cultural evolution, so an AI that can do that too (but is otherwise unexceptional) would have no advantage. But the possibility we are considering is that AI might be to humans what humans are to monkeys; thus, if the difference between humans and monkeys is greater intelligence allowing them to accumulate language, there might be some similarly important difference between AIs and humans. For example, language is a tool that lets humans learn from the experience of others, but AIs can literally learn from the experience of others—via the mechanism of having many copies that share weights and gradient updates! They can also e.g. graft more neurons onto an existing AI to make it smarter, think at greater serial speed, integrate calculators and other programs into their functioning and learn to use them intuitively as part of their regular thought processes… I won’t be surprised if somewhere in the grab bag of potential advantages AIs have over humans is one (or several added together) as big as the language advantage humans have over monkeys.
Plus, there’s language itself. It’s not a binary, it’s a spectrum; monkeys can use it too, to some small degree. And some humans can use it more/better than others. Perhaps AIs will (eventually, and perhaps even soon) be better at using language than the best humans.
Consequently, the first slightly smarter-than-human agent will probably not be able to leverage its raw intelligence to unilaterally take over the world, for pretty much the same reason that an individual human would not be able to unilaterally take over a band of chimps, in the state of nature, despite the intelligence advantage of the human.
Here’s how I think we should think about it. Taboo “intelligence.” Instead we just have a list of metrics a, b, c, … z, some of which are overlapping, some of which are subsets of others, etc. One of these metrics, then, is “takeover ability (intellectual component).” This metric, when combined with “takeover ability (resources),” “Takeover ability (social status)” and maybe a few others that track “exogenous” factors about how others treat the AI and what resources it has, combine together to create “overall takeover ability.”
Now, I claim, (1) Takeover is a tournament (blog post TBD, but see my writings about lessons from the conquistadors) and I cite this as support for claim (2) takeover would be easy for AIs, by which I mean, IF AIs were mildly superhuman in the intellectual component of takeover ability, they would plausibly start off with enough of the other components that they would be able to secure more of those other components fairly quickly, stay out of trouble, etc. until they could actually take over—in other words, their overall takeover ability would be mildly superhuman as well.
(I haven’t argued for this much yet but I plan to in future posts. Also I expect some people will find it obvious, and maybe you are one such person.)
Now, how should we think about AI timelines-till-human-level-takeover-ability-(intellectual)?
Same way we think about AI timelines for AGI, or TAI, or whatever. I mean obviously there are differences, but I don’t think we have reason to think that the intellectual component of takeover ability is vastly more difficult than e.g. being human-level AGI, or being able to massively accelerate world GDP, or being able to initiate recursive self-improvement or an R&D acceleration.
I mean it might be. It’s a different metric, after all. But it also might come earlier than those things. It might be easier. And I have plausibility arguments to make for that claim in fact.
So anyhow I claim: We can redo all our timelines analyses with “slightly superhuman takeover ability (intellectual)” as the thing to forecast instead of TAI or AGI or whatever, and get roughly the same numbers. And then (I claim) this is tracking when we should worry about AI takeover. Yes, by a single AI system, if only one exists; if multiple exist then by multiple.
We can hope that we’ll get really good AI alignment research assistants before we get AIs good at taking over… but that’s just a hope at this point; it totally could come in the opposite order and I have arguments that it would.
There’s a large range of human intelligence, such that it makes sense to talk about AI slowly going from 50th percentile to 99.999th percentile on pretty much any intellectual task, rather than AI suddenly jumping to superhuman levels after a single major insight. In cases where progress in performance does happen rapidly, the usual reason is that there wasn’t much effort previously being put into getting better at the task.
The case of AlphaGo is instructive here: improving the SOTA on Go bots is not very profitable. We should expect, therefore, that there will be relatively few resources being put into that task, compared to the overall size of the economy. However, if a single rich company, like Google, at some point does decide to invest considerable resources into improving Go performance, then we could easily observe a discontinuity in progress. Yet, this discontinuity in output merely reflects a discontinuity in inputs, not a discontinuity as a response to small changes in those inputs, as is usually a prerequisite for foom in theoretical models.
Hardware progress and experimentation are much stronger drivers of AI progress than novel theoretical insights. The most impressive insights, like backpropagation and transformers, are probably in our past. And as the field becomes more mature, it will likely become even harder to make important theoretical discoveries.
These points make the primacy of recursive self-improvement, and as a consequence, unipolarity in AI takeoff, less likely in the future development of AI. That’s because hardware progress and AI experimentation are, for the most part, society-wide inputs, which can be contributed by a wide variety of actors, don’t exhibit strong feedback loops on an individual level, and more-or-less have smooth responses to small changes in their inputs. Absent some way of making AI far better via a small theoretical tweak, it seems that we should expect smooth, gradual progress by default, even if overall economic growth becomes very high after the invention of AGI.
I claim this argument is a motte and bailey. The motte is the first three paragraphs, where you give good sensible reasons to think that discontinuities and massive conceptual leaps, while possible, are not typical. The bailey is the last paragraph where you suggest that we can therefore conclude unipolar takeoff is unlikely and that progress will go the way Paul Christiano thinks it’ll go instead of the way Yudkowsky thinks it’ll go. I have sat down to make toy models of what takeoff might look like, and even with zero discontinuities and five-year-spans of time to “cross the human range” the situation looks qualitatively a lot more like Yudkowsky’s story than Christiano’s. Of course you shouldn’t take my word for it, and also just because the one or two models I made looked this way doesn’t mean I’m right, maybe someone with different biases could make different models that would come out differently. But still. (Note: Part of why my models came out this way was that I was assuming stuff happens in 5-15 years from now. Paul Christiano would agree, I think, that given this assumption takeoff would be pretty fast. I haven’t tried to model what things look like on 20+ year timelines.)
There are strong pressures—including the principle of comparative advantage, diseconomies of scale, and gains from specialization—that incentivize making economic services narrow and modular, rather than general and all-encompassing. Illustratively, a large factory where each worker specializes in their particular role will be much more productive than a factory in which each worker is trained to be a generalist, even though no one understands any particular component of the production process very well.
What is true in human economics will apply to AI services as well. This implies we should expect something like Eric Drexler’s AI perspective, which emphasizes economic production across many agents who trade and produce narrow services, as opposed to monolithic agents that command and control.
This may be our biggest disagrement. Drexler’s vision of comprehensive AI services is a beautiful fantasy IMO. Agents are powerful. There will be plenty of AI services, yes, but there will also be AI agents, and those are what we are worried about. And yes it’s theoretically possible to develop the right AI services in advance to help us control the agents when they appear… but we’d best get started building them then, because they aren’t going to build themselves. And eyeballing the progress towards AI agents vs. useful interpretability tools etc., it’s not looking good.
Having seen undeniable, large economic effects from AI, policymakers will eventually realize that AGI is important, and will launch massive efforts to regulate it. The current lack of concern almost certainly reflects the fact that powerful AI hasn’t arrived yet.
There’s a long history of people regulating industries after disasters—like nuclear energy—and, given the above theses, it seems likely that there will be at least a few “warning shots” which will provide a trigger for companies and governments to crack down and invest heavily into making things go the way they want.
(Note that this does not imply any sort of optimism about the effects of these regulations, only that they will exist and will have a large effect on the trajectory of AI)
I agree in principle, but unfortunately it seems like things are going to happen fast enough (over the span of a few years at most) and soon enough (in the next decade or so, NOT in 30 years after the economy has already been transformed by narrow AI systems) that it really doesn’t seem like governments are going to do much by default. We still have the opportunity to plan ahead and get governments to do stuff! But I think if we sit on our asses, nothing of use will happen. (Probably there will be some regulation but it’ll be irrelevant like most regulation is.)
In particular I think that we won’t get any cool exciting scary AI takeover near-misses that cause massive crackdowns on the creation of AIs that could possibly take over, the way we did for nuclear power plants. Why would we? The jargon for this is “Sordid Stumble before Treacherous Turn.” It might happen but we shouldn’t expect it by default I think. Yes, before AIs are smart enough to take over, they will be dumber. But what matters is: Before an AI is smart enough to take over and smart enough to realize this, will there be an AI that can’t take over but thinks it can? And “before” can’t be “two weeks before” either, it probably needs to be more like two months or two years, otherwise the dastardly plan won’t have time to go awry and be caught and argued about and then regulated against. Also the AI in question has to be scarily smart otherwise it’s takeover attempt will fail so early that it won’t be registered as such, it’ll be like GPT-3 lying to users to get reward or Facebook’s recommendation algorithm causing thousands of teenage girls to kill themselves, people will be like “Oh yes this was an error, good thing we train that sort of thing away, see look how the system behaves better now.”
The effect of the above points is not to provide us uniform optimism about AI safety, and our collective future. It is true that, if we accept the previous theses, then many of the points in Eliezer’s list of AI lethalities become far less plausible. But, equally, one could view these theses pessimistically, by thinking that they imply the trajectory of future AI is much harder to intervene on, and do anything about, relative to the Yudkowskian view.
I haven’t gone through the list point by point, I won’t comment on this then. I agree that longer timelines slow takeoff worlds we have less influence over relative to other humans.
It’s not so much an incident as a trend. I haven’t investigated it myself, but I’ve read lots of people making this claim, citing various studies, etc. See e.g. “The social dilemma” by Tristan Harris.
There’s an academic literature on the subject now which I haven’t read but which you can probably find by googling.
I just did a quick search and found graphs like this:
Presumably not all of the increase in deaths is due to Facebook; presumably it’s multi-causal blah blah blah. But even if Facebook is responsible for a tiny fraction of the increase, that would mean Facebook was responsible for thousands of deaths.
Nice! Thanks! I’ll give my commentary on your commentary, also point by point. Your stuff italicized, my stuff not. Warning: Wall of text incoming! :)
I think raw intelligence, while important, is not the primary factor that explains why humanity-as-a-species is much more powerful than chimpanzees-as-a-species. Notably, humans were once much less powerful, in our hunter-gatherer days, but over time, through the gradual process of accumulating technology, knowledge, and culture, humans now possess vast productive capacities that far outstrip our ancient powers.
Similarly, our ability to coordinate through language also plays a huge role in explaining our power compared to other animals. But, on a first approximation, other animals can’t coordinate at all, making this distinction much less impressive. The first AGIs we construct will be born into a culture already capable of coordinating, and sharing knowledge, making the potential power difference between AGI and humans relatively much smaller than between humans and other animals, at least at first.
I don’t think I understand this argument. Yes, humans can use language to coordinate & benefit from cultural evolution, so an AI that can do that too (but is otherwise unexceptional) would have no advantage. But the possibility we are considering is that AI might be to humans what humans are to monkeys; thus, if the difference between humans and monkeys is greater intelligence allowing them to accumulate language, there might be some similarly important difference between AIs and humans. For example, language is a tool that lets humans learn from the experience of others, but AIs can literally learn from the experience of others—via the mechanism of having many copies that share weights and gradient updates! They can also e.g. graft more neurons onto an existing AI to make it smarter, think at greater serial speed, integrate calculators and other programs into their functioning and learn to use them intuitively as part of their regular thought processes… I won’t be surprised if somewhere in the grab bag of potential advantages AIs have over humans is one (or several added together) as big as the language advantage humans have over monkeys.
Plus, there’s language itself. It’s not a binary, it’s a spectrum; monkeys can use it too, to some small degree. And some humans can use it more/better than others. Perhaps AIs will (eventually, and perhaps even soon) be better at using language than the best humans.
Consequently, the first slightly smarter-than-human agent will probably not be able to leverage its raw intelligence to unilaterally take over the world, for pretty much the same reason that an individual human would not be able to unilaterally take over a band of chimps, in the state of nature, despite the intelligence advantage of the human.
Here’s how I think we should think about it. Taboo “intelligence.” Instead we just have a list of metrics a, b, c, … z, some of which are overlapping, some of which are subsets of others, etc. One of these metrics, then, is “takeover ability (intellectual component).” This metric, when combined with “takeover ability (resources),” “Takeover ability (social status)” and maybe a few others that track “exogenous” factors about how others treat the AI and what resources it has, combine together to create “overall takeover ability.”
Now, I claim, (1) Takeover is a tournament (blog post TBD, but see my writings about lessons from the conquistadors) and I cite this as support for claim (2) takeover would be easy for AIs, by which I mean, IF AIs were mildly superhuman in the intellectual component of takeover ability, they would plausibly start off with enough of the other components that they would be able to secure more of those other components fairly quickly, stay out of trouble, etc. until they could actually take over—in other words, their overall takeover ability would be mildly superhuman as well.
(I haven’t argued for this much yet but I plan to in future posts. Also I expect some people will find it obvious, and maybe you are one such person.)
Now, how should we think about AI timelines-till-human-level-takeover-ability-(intellectual)?
Same way we think about AI timelines for AGI, or TAI, or whatever. I mean obviously there are differences, but I don’t think we have reason to think that the intellectual component of takeover ability is vastly more difficult than e.g. being human-level AGI, or being able to massively accelerate world GDP, or being able to initiate recursive self-improvement or an R&D acceleration.
I mean it might be. It’s a different metric, after all. But it also might come earlier than those things. It might be easier. And I have plausibility arguments to make for that claim in fact.
So anyhow I claim: We can redo all our timelines analyses with “slightly superhuman takeover ability (intellectual)” as the thing to forecast instead of TAI or AGI or whatever, and get roughly the same numbers. And then (I claim) this is tracking when we should worry about AI takeover. Yes, by a single AI system, if only one exists; if multiple exist then by multiple.
We can hope that we’ll get really good AI alignment research assistants before we get AIs good at taking over… but that’s just a hope at this point; it totally could come in the opposite order and I have arguments that it would.
There’s a large range of human intelligence, such that it makes sense to talk about AI slowly going from 50th percentile to 99.999th percentile on pretty much any intellectual task, rather than AI suddenly jumping to superhuman levels after a single major insight. In cases where progress in performance does happen rapidly, the usual reason is that there wasn’t much effort previously being put into getting better at the task.
The case of AlphaGo is instructive here: improving the SOTA on Go bots is not very profitable. We should expect, therefore, that there will be relatively few resources being put into that task, compared to the overall size of the economy. However, if a single rich company, like Google, at some point does decide to invest considerable resources into improving Go performance, then we could easily observe a discontinuity in progress. Yet, this discontinuity in output merely reflects a discontinuity in inputs, not a discontinuity as a response to small changes in those inputs, as is usually a prerequisite for foom in theoretical models.
Hardware progress and experimentation are much stronger drivers of AI progress than novel theoretical insights. The most impressive insights, like backpropagation and transformers, are probably in our past. And as the field becomes more mature, it will likely become even harder to make important theoretical discoveries.
These points make the primacy of recursive self-improvement, and as a consequence, unipolarity in AI takeoff, less likely in the future development of AI. That’s because hardware progress and AI experimentation are, for the most part, society-wide inputs, which can be contributed by a wide variety of actors, don’t exhibit strong feedback loops on an individual level, and more-or-less have smooth responses to small changes in their inputs. Absent some way of making AI far better via a small theoretical tweak, it seems that we should expect smooth, gradual progress by default, even if overall economic growth becomes very high after the invention of AGI.
I claim this argument is a motte and bailey. The motte is the first three paragraphs, where you give good sensible reasons to think that discontinuities and massive conceptual leaps, while possible, are not typical. The bailey is the last paragraph where you suggest that we can therefore conclude unipolar takeoff is unlikely and that progress will go the way Paul Christiano thinks it’ll go instead of the way Yudkowsky thinks it’ll go. I have sat down to make toy models of what takeoff might look like, and even with zero discontinuities and five-year-spans of time to “cross the human range” the situation looks qualitatively a lot more like Yudkowsky’s story than Christiano’s. Of course you shouldn’t take my word for it, and also just because the one or two models I made looked this way doesn’t mean I’m right, maybe someone with different biases could make different models that would come out differently. But still. (Note: Part of why my models came out this way was that I was assuming stuff happens in 5-15 years from now. Paul Christiano would agree, I think, that given this assumption takeoff would be pretty fast. I haven’t tried to model what things look like on 20+ year timelines.)
There are strong pressures—including the principle of comparative advantage, diseconomies of scale, and gains from specialization—that incentivize making economic services narrow and modular, rather than general and all-encompassing. Illustratively, a large factory where each worker specializes in their particular role will be much more productive than a factory in which each worker is trained to be a generalist, even though no one understands any particular component of the production process very well.
What is true in human economics will apply to AI services as well. This implies we should expect something like Eric Drexler’s AI perspective, which emphasizes economic production across many agents who trade and produce narrow services, as opposed to monolithic agents that command and control.
This may be our biggest disagrement. Drexler’s vision of comprehensive AI services is a beautiful fantasy IMO. Agents are powerful. There will be plenty of AI services, yes, but there will also be AI agents, and those are what we are worried about. And yes it’s theoretically possible to develop the right AI services in advance to help us control the agents when they appear… but we’d best get started building them then, because they aren’t going to build themselves. And eyeballing the progress towards AI agents vs. useful interpretability tools etc., it’s not looking good.
Having seen undeniable, large economic effects from AI, policymakers will eventually realize that AGI is important, and will launch massive efforts to regulate it. The current lack of concern almost certainly reflects the fact that powerful AI hasn’t arrived yet.
There’s a long history of people regulating industries after disasters—like nuclear energy—and, given the above theses, it seems likely that there will be at least a few “warning shots” which will provide a trigger for companies and governments to crack down and invest heavily into making things go the way they want.
(Note that this does not imply any sort of optimism about the effects of these regulations, only that they will exist and will have a large effect on the trajectory of AI)
I agree in principle, but unfortunately it seems like things are going to happen fast enough (over the span of a few years at most) and soon enough (in the next decade or so, NOT in 30 years after the economy has already been transformed by narrow AI systems) that it really doesn’t seem like governments are going to do much by default. We still have the opportunity to plan ahead and get governments to do stuff! But I think if we sit on our asses, nothing of use will happen. (Probably there will be some regulation but it’ll be irrelevant like most regulation is.)
In particular I think that we won’t get any cool exciting scary AI takeover near-misses that cause massive crackdowns on the creation of AIs that could possibly take over, the way we did for nuclear power plants. Why would we? The jargon for this is “Sordid Stumble before Treacherous Turn.” It might happen but we shouldn’t expect it by default I think. Yes, before AIs are smart enough to take over, they will be dumber. But what matters is: Before an AI is smart enough to take over and smart enough to realize this, will there be an AI that can’t take over but thinks it can? And “before” can’t be “two weeks before” either, it probably needs to be more like two months or two years, otherwise the dastardly plan won’t have time to go awry and be caught and argued about and then regulated against. Also the AI in question has to be scarily smart otherwise it’s takeover attempt will fail so early that it won’t be registered as such, it’ll be like GPT-3 lying to users to get reward or Facebook’s recommendation algorithm causing thousands of teenage girls to kill themselves, people will be like “Oh yes this was an error, good thing we train that sort of thing away, see look how the system behaves better now.”
The effect of the above points is not to provide us uniform optimism about AI safety, and our collective future. It is true that, if we accept the previous theses, then many of the points in Eliezer’s list of AI lethalities become far less plausible. But, equally, one could view these theses pessimistically, by thinking that they imply the trajectory of future AI is much harder to intervene on, and do anything about, relative to the Yudkowskian view.
I haven’t gone through the list point by point, I won’t comment on this then. I agree that longer timelines slow takeoff worlds we have less influence over relative to other humans.
“I have sat down to make toy models ..”
reference?
? I am the reference, I’m describing a personal experience.
I meant, is there a link to where you’ve written this down somewhere? Maybe you just haven’t written it down.
I’ll send you a DM.
Markdown has syntax for quotes: a line with
> this
on it will look likeCan I get a link or two to read more about this incident?
It’s not so much an incident as a trend. I haven’t investigated it myself, but I’ve read lots of people making this claim, citing various studies, etc. See e.g. “The social dilemma” by Tristan Harris.
There’s an academic literature on the subject now which I haven’t read but which you can probably find by googling.
I just did a quick search and found graphs like this:
Presumably not all of the increase in deaths is due to Facebook; presumably it’s multi-causal blah blah blah. But even if Facebook is responsible for a tiny fraction of the increase, that would mean Facebook was responsible for thousands of deaths.