It’s as good as time as any to re-iterate my reasons for disagreeing with what I see as the Yudkowskian view of future AI. What follows isn’t intended as a rebuttal of any specific argument in this essay, but merely a pointer that I’m providing for readers, that may help explain why some people might disagree with the conclusion and reasoning contained within.
I’ll provide my cruxes point-by-point,
I think raw intelligence, while important, is not the primary factor that explains why humanity-as-a-species is much more powerful than chimpanzees-as-a-species. Notably, humans were once much less powerful, in our hunter-gatherer days, but over time, through the gradual process of accumulating technology, knowledge, and culture, humans now possess vast productive capacities that far outstrip our ancient powers.
Similarly, our ability to coordinate through language also plays a huge role in explaining our power compared to other animals. But, on a first approximation, other animals can’t coordinate at all, making this distinction much less impressive. The first AGIs we construct will be born into a culture already capable of coordinating, and sharing knowledge, making the potential power difference between AGI and humans relatively much smaller than between humans and other animals, at least at first.
Consequently, the first slightly smarter-than-human agent will probably not be able to leverage its raw intelligence to unilaterally take over the world, for pretty much the same reason that an individual human would not be able to unilaterally take over a band of chimps, in the state of nature, despite the intelligence advantage of the human.
There’s a large range of human intelligence, such that it makes sense to talk about AI slowly going from 50th percentile to 99.999th percentile on pretty much any important general intellectual task, rather than AI suddenly jumping to superhuman levels after a single major insight. In cases where progress in performance does happen rapidly, the usual reason is that there wasn’t much effort previously being put into getting better at the task.
The case of AlphaGo is instructive here: improving the SOTA on Go bots is not very profitable. We should expect, therefore, that there will be relatively few resources being put into that task, compared to the overall size of the economy. However, if a single rich company, like Google, at some point doesdecide to invest considerable resources into improving Go performance, then we could easily observe a discontinuity in progress. Yet, this discontinuity in output merely reflects a discontinuity in inputs, not a discontinuity as a response to small changes in those inputs, as is usually a prerequisite for foom in theoretical models.
Hardware progress and experimentation are much stronger drivers of AI progress than novel theoretical insights. The most impressive insights, like backpropagation and transformers, are probably in our past. And as the field becomes more mature, it will likely become even harder to make important theoretical discoveries.
These points make the primacy of recursive self-improvement, and as a consequence, unipolarity in AI takeoff, less likely in the future development of AI. That’s because hardware progress and AI experimentation are, for the most part, society-wide inputs, which can be contributed by a wide variety of actors, don’t exhibit strong feedback loops on an individual level, and more-or-less have smooth responses to small changes in their inputs. Absent some way of making AI far better via a small theoretical tweak, it seems that we should expect smooth, gradual progress by default, even if overall economic growth becomes very high after the invention of AGI.
[Update (June 2023): While I think these considerations are still important, I think the picture I painted in this section was misleading. I wrote about my views of AI services here.] There are strong pressures—including the principle of comparative advantage, diseconomies of scale, and gains from specialization—that incentivize making economic services narrow and modular, rather than general and all-encompassing. Illustratively, a large factory where each worker specializes in their particular role will be much more productive than a factory in which each worker is trained to be a generalist, even though no one understands any particular component of the production process very well.
What is true in human economics will apply to AI services as well. This implies we should expect something like Eric Drexler’s AI perspective, which emphasizes economic production across many agents who trade and produce narrow services, as opposed to monolithic agents that command and control.
Having seen undeniable, large economic effects from AI, policymakers will eventually realize that AGI is important, and will launch massive efforts to regulate it. The current lack of concern almost certainly reflects the fact that powerful AI hasn’t arrived yet.
There’s a long history of people regulating industries after disasters—like nuclear energy—and, given the above theses, it seems likely that there will be at least a few “warning shots” which will provide a trigger for companies and governments to crack down and invest heavily into making things go the way they want.
(Note that this does not imply any sort of optimism about the effects of these regulations, only that they will exist and will have a large effect on the trajectory of AI)
The effect of the above points is not to provide us uniform optimism about AI safety, and our collective future. It is true that, if we accept the previous theses, then many of the points in Eliezer’s list of AI lethalities become far less plausible. But, equally, one could view these theses pessimistically, by thinking that they imply the trajectory of future AI is much harder to intervene on, and do anything about, relative to the Yudkowskian view.
Similarly, our ability to coordinate through language also plays a huge role in explaining our power compared to other animals. But, on a first approximation, other animals can’t coordinate at all, making this distinction much less impressive. The first AGIs we construct will be born into a culture already capable of coordinating, and sharing knowledge, making the potential power difference between AGI and humans relatively much smaller than between humans and other animals, at least at first.
I basically buy the story that human intelligence is less useful that human coordination; i.e. it’s the intelligence of “humanity” the entity that matters, with the intelligence of individual humans relevant only as, like, subcomponents of that entity.
But… shouldn’t this mean you expect AGI civilization to totally dominate human civilization? They can read each other’s source code, and thus trust much more deeply! They can transmit information between them at immense bandwidths! They can clone their minds and directly learn from each other’s experiences!
Like, one scenario I visualize a lot is the NHS having a single ‘DocBot’, i.e. an artificial doctor run on datacenters that provides medical advice and decision-making for everyone in the UK (while still working with nurses and maybe surgeons and so on). Normally I focus on the way that it gets about three centuries of experience treating human patients per day, but imagine the difference in coordination capacity between DocBot and the BMA.
Having seen undeniable, large economic effects from AI, policymakers will eventually realize that AGI is important, and will launch massive efforts to regulate it.
I think everyone expects this, and often disagree on the timescale on which it will arrive. See, for example, Elon Musk’s speech to the US National Governors Association, where he argues that the reactive regulation model will be too slow to handle the crisis.
But I think the even more important disagreement is on whether or not regulations should be expected to work. Ok, so you make it so that only corporations with large compliance departments can run AGI. How does that help? There was a tweet by Matt Yglesias a while ago that I can’t find now, which went something like: “a lot of smart people are worried about AI, and when you ask them what the government can do about it, they have no idea; this is an extremely wild situation from the perspective of a policy person.” A law that says “don’t run the bad code” is predicated on the ability to tell the good code from the bad code, which is the main thing we’re missing and don’t know how to get!
And if you say something like “ok, one major self-driving car accident will be enough to convince everyone to do the Butlerian Jihad and smash all the computers”, that’s really not how it looks to me. Like, the experience of COVID seems a lot like “people who were doing risky research in labs got out in front of everyone else to claim that the lab leak hypothesis was terrible and unscientific, and all of the anti-disinformation machinery was launched to suppress it, and it took a shockingly long time to even be able to raise the hypothesis, and it hasn’t clearly swept the field, and legislation to do something about risky research seems like it definitely isn’t a slam dunk.”
When we get some AI warning signs, I expect there are going to be people with the ability to generate pro-AI disinfo and a strong incentive to do so. I expect there to be significant latent political polarization which will tangle up any attempt to do something useful about it. I expect there won’t be anything like the international coordination that was necessary to set up anti-nuclear-proliferation efforts to set up the probably harder problem of anti-AGI-proliferation efforts.
But… shouldn’t this mean you expect AGI civilization to totally dominate human civilization? They can read each other’s source code, and thus trust much more deeply! They can transmit information between them at immense bandwidths! They can clone their minds and directly learn from each other’s experiences!
This is 100% correct, and part of why I expect the focus on superintelligence, while literally true, is bad for AI outreach. There’s a much simpler (and empirically, in my experience, more convincing) explanation of why we lose to even an AI with an IQ of 110. It is Dath Ilan, and we are Earth. Coordination is difficult for humans and the easy part for AIs.
I will note that Eliezer wrote That Alien Message a long time ago I think in part to try to convey the issue to this perspective, but it’s mostly about “information-theoretic bounds are probably not going to be tight” in a simulation-y universe instead of “here’s what coordination between computers looks like today”. I do predict the coordination point would be good to include in more of the intro materials.
But… shouldn’t this mean you expect AGI civilization to totally dominate human civilization? They can read each other’s source code, and thus trust much more deeply! They can transmit information between them at immense bandwidths! They can clone their minds and directly learn from each other’s experiences!
I don’t think it’s obvious that this means that AGI is more dangerous, because it means that for a fixed total impact of AGI, the AGI doesn’t have to be as competent at individual thinking (because it leans relatively more on group thinking). And so at the point where the AGIs are becoming very powerful in aggregate, this argument pushes us away from thinking they’re good at individual thinking.
Also, it’s not obvious that early AIs will actually be able to do this if their creators don’t find a way to train them to have this affordance. ML doesn’t currently normally make AIs which can helpfully share mind-states, and it probably requires non-trivial effort to hook them up correctly to be able to share mind-state.
They can read each other’s source code, and thus trust much more deeply!
Being able to read source code doesn’t automatically increase trust—you also have to be able to verify that the code being shared with you actually governs the AGI’s behavior, despite that AGI’s incentives and abilities to fool you.
(Conditional on the AGIs having strongly aligned goals with each other, sure, this degree of transparency would help them with pure coordination problems.)
Nice! Thanks! I’ll give my commentary on your commentary, also point by point. Your stuff italicized, my stuff not. Warning: Wall of text incoming! :)
I think raw intelligence, while important, is not the primary factor that explains why humanity-as-a-species is much more powerful than chimpanzees-as-a-species. Notably, humans were once much less powerful, in our hunter-gatherer days, but over time, through the gradual process of accumulating technology, knowledge, and culture, humans now possess vast productive capacities that far outstrip our ancient powers.
Similarly, our ability to coordinate through language also plays a huge role in explaining our power compared to other animals. But, on a first approximation, other animals can’t coordinate at all, making this distinction much less impressive. The first AGIs we construct will be born into a culture already capable of coordinating, and sharing knowledge, making the potential power difference between AGI and humans relatively much smaller than between humans and other animals, at least at first.
I don’t think I understand this argument. Yes, humans can use language to coordinate & benefit from cultural evolution, so an AI that can do that too (but is otherwise unexceptional) would have no advantage. But the possibility we are considering is that AI might be to humans what humans are to monkeys; thus, if the difference between humans and monkeys is greater intelligence allowing them to accumulate language, there might be some similarly important difference between AIs and humans. For example, language is a tool that lets humans learn from the experience of others, but AIs can literally learn from the experience of others—via the mechanism of having many copies that share weights and gradient updates! They can also e.g. graft more neurons onto an existing AI to make it smarter, think at greater serial speed, integrate calculators and other programs into their functioning and learn to use them intuitively as part of their regular thought processes… I won’t be surprised if somewhere in the grab bag of potential advantages AIs have over humans is one (or several added together) as big as the language advantage humans have over monkeys.
Plus, there’s language itself. It’s not a binary, it’s a spectrum; monkeys can use it too, to some small degree. And some humans can use it more/better than others. Perhaps AIs will (eventually, and perhaps even soon) be better at using language than the best humans.
Consequently, the first slightly smarter-than-human agent will probably not be able to leverage its raw intelligence to unilaterally take over the world, for pretty much the same reason that an individual human would not be able to unilaterally take over a band of chimps, in the state of nature, despite the intelligence advantage of the human.
Here’s how I think we should think about it. Taboo “intelligence.” Instead we just have a list of metrics a, b, c, … z, some of which are overlapping, some of which are subsets of others, etc. One of these metrics, then, is “takeover ability (intellectual component).” This metric, when combined with “takeover ability (resources),” “Takeover ability (social status)” and maybe a few others that track “exogenous” factors about how others treat the AI and what resources it has, combine together to create “overall takeover ability.”
Now, I claim, (1) Takeover is a tournament (blog post TBD, but see my writings about lessons from the conquistadors) and I cite this as support for claim (2) takeover would be easy for AIs, by which I mean, IF AIs were mildly superhuman in the intellectual component of takeover ability, they would plausibly start off with enough of the other components that they would be able to secure more of those other components fairly quickly, stay out of trouble, etc. until they could actually take over—in other words, their overall takeover ability would be mildly superhuman as well.
(I haven’t argued for this much yet but I plan to in future posts. Also I expect some people will find it obvious, and maybe you are one such person.)
Now, how should we think about AI timelines-till-human-level-takeover-ability-(intellectual)?
Same way we think about AI timelines for AGI, or TAI, or whatever. I mean obviously there are differences, but I don’t think we have reason to think that the intellectual component of takeover ability is vastly more difficult than e.g. being human-level AGI, or being able to massively accelerate world GDP, or being able to initiate recursive self-improvement or an R&D acceleration.
I mean it might be. It’s a different metric, after all. But it also might come earlier than those things. It might be easier. And I have plausibility arguments to make for that claim in fact.
So anyhow I claim: We can redo all our timelines analyses with “slightly superhuman takeover ability (intellectual)” as the thing to forecast instead of TAI or AGI or whatever, and get roughly the same numbers. And then (I claim) this is tracking when we should worry about AI takeover. Yes, by a single AI system, if only one exists; if multiple exist then by multiple.
We can hope that we’ll get really good AI alignment research assistants before we get AIs good at taking over… but that’s just a hope at this point; it totally could come in the opposite order and I have arguments that it would.
There’s a large range of human intelligence, such that it makes sense to talk about AI slowly going from 50th percentile to 99.999th percentile on pretty much any intellectual task, rather than AI suddenly jumping to superhuman levels after a single major insight. In cases where progress in performance does happen rapidly, the usual reason is that there wasn’t much effort previously being put into getting better at the task.
The case of AlphaGo is instructive here: improving the SOTA on Go bots is not very profitable. We should expect, therefore, that there will be relatively few resources being put into that task, compared to the overall size of the economy. However, if a single rich company, like Google, at some point does decide to invest considerable resources into improving Go performance, then we could easily observe a discontinuity in progress. Yet, this discontinuity in output merely reflects a discontinuity in inputs, not a discontinuity as a response to small changes in those inputs, as is usually a prerequisite for foom in theoretical models.
Hardware progress and experimentation are much stronger drivers of AI progress than novel theoretical insights. The most impressive insights, like backpropagation and transformers, are probably in our past. And as the field becomes more mature, it will likely become even harder to make important theoretical discoveries.
These points make the primacy of recursive self-improvement, and as a consequence, unipolarity in AI takeoff, less likely in the future development of AI. That’s because hardware progress and AI experimentation are, for the most part, society-wide inputs, which can be contributed by a wide variety of actors, don’t exhibit strong feedback loops on an individual level, and more-or-less have smooth responses to small changes in their inputs. Absent some way of making AI far better via a small theoretical tweak, it seems that we should expect smooth, gradual progress by default, even if overall economic growth becomes very high after the invention of AGI.
I claim this argument is a motte and bailey. The motte is the first three paragraphs, where you give good sensible reasons to think that discontinuities and massive conceptual leaps, while possible, are not typical. The bailey is the last paragraph where you suggest that we can therefore conclude unipolar takeoff is unlikely and that progress will go the way Paul Christiano thinks it’ll go instead of the way Yudkowsky thinks it’ll go. I have sat down to make toy models of what takeoff might look like, and even with zero discontinuities and five-year-spans of time to “cross the human range” the situation looks qualitatively a lot more like Yudkowsky’s story than Christiano’s. Of course you shouldn’t take my word for it, and also just because the one or two models I made looked this way doesn’t mean I’m right, maybe someone with different biases could make different models that would come out differently. But still. (Note: Part of why my models came out this way was that I was assuming stuff happens in 5-15 years from now. Paul Christiano would agree, I think, that given this assumption takeoff would be pretty fast. I haven’t tried to model what things look like on 20+ year timelines.)
There are strong pressures—including the principle of comparative advantage, diseconomies of scale, and gains from specialization—that incentivize making economic services narrow and modular, rather than general and all-encompassing. Illustratively, a large factory where each worker specializes in their particular role will be much more productive than a factory in which each worker is trained to be a generalist, even though no one understands any particular component of the production process very well.
What is true in human economics will apply to AI services as well. This implies we should expect something like Eric Drexler’s AI perspective, which emphasizes economic production across many agents who trade and produce narrow services, as opposed to monolithic agents that command and control.
This may be our biggest disagrement. Drexler’s vision of comprehensive AI services is a beautiful fantasy IMO. Agents are powerful. There will be plenty of AI services, yes, but there will also be AI agents, and those are what we are worried about. And yes it’s theoretically possible to develop the right AI services in advance to help us control the agents when they appear… but we’d best get started building them then, because they aren’t going to build themselves. And eyeballing the progress towards AI agents vs. useful interpretability tools etc., it’s not looking good.
Having seen undeniable, large economic effects from AI, policymakers will eventually realize that AGI is important, and will launch massive efforts to regulate it. The current lack of concern almost certainly reflects the fact that powerful AI hasn’t arrived yet.
There’s a long history of people regulating industries after disasters—like nuclear energy—and, given the above theses, it seems likely that there will be at least a few “warning shots” which will provide a trigger for companies and governments to crack down and invest heavily into making things go the way they want.
(Note that this does not imply any sort of optimism about the effects of these regulations, only that they will exist and will have a large effect on the trajectory of AI)
I agree in principle, but unfortunately it seems like things are going to happen fast enough (over the span of a few years at most) and soon enough (in the next decade or so, NOT in 30 years after the economy has already been transformed by narrow AI systems) that it really doesn’t seem like governments are going to do much by default. We still have the opportunity to plan ahead and get governments to do stuff! But I think if we sit on our asses, nothing of use will happen. (Probably there will be some regulation but it’ll be irrelevant like most regulation is.)
In particular I think that we won’t get any cool exciting scary AI takeover near-misses that cause massive crackdowns on the creation of AIs that could possibly take over, the way we did for nuclear power plants. Why would we? The jargon for this is “Sordid Stumble before Treacherous Turn.” It might happen but we shouldn’t expect it by default I think. Yes, before AIs are smart enough to take over, they will be dumber. But what matters is: Before an AI is smart enough to take over and smart enough to realize this, will there be an AI that can’t take over but thinks it can? And “before” can’t be “two weeks before” either, it probably needs to be more like two months or two years, otherwise the dastardly plan won’t have time to go awry and be caught and argued about and then regulated against. Also the AI in question has to be scarily smart otherwise it’s takeover attempt will fail so early that it won’t be registered as such, it’ll be like GPT-3 lying to users to get reward or Facebook’s recommendation algorithm causing thousands of teenage girls to kill themselves, people will be like “Oh yes this was an error, good thing we train that sort of thing away, see look how the system behaves better now.”
The effect of the above points is not to provide us uniform optimism about AI safety, and our collective future. It is true that, if we accept the previous theses, then many of the points in Eliezer’s list of AI lethalities become far less plausible. But, equally, one could view these theses pessimistically, by thinking that they imply the trajectory of future AI is much harder to intervene on, and do anything about, relative to the Yudkowskian view.
I haven’t gone through the list point by point, I won’t comment on this then. I agree that longer timelines slow takeoff worlds we have less influence over relative to other humans.
It’s not so much an incident as a trend. I haven’t investigated it myself, but I’ve read lots of people making this claim, citing various studies, etc. See e.g. “The social dilemma” by Tristan Harris.
There’s an academic literature on the subject now which I haven’t read but which you can probably find by googling.
I just did a quick search and found graphs like this:
Presumably not all of the increase in deaths is due to Facebook; presumably it’s multi-causal blah blah blah. But even if Facebook is responsible for a tiny fraction of the increase, that would mean Facebook was responsible for thousands of deaths.
You said you weren’t replying to any specific point Eliezer was making, but I think it’s worth pointing out that when he brings up Alpha Go, he’s not talking about the 2 years it took Google to build a Go-playing AI—remarkable and surprising as that was—but rather the 3 days it took Alpha Zero to go from not knowing anything about the game beyond the basic rules to being better than all humans and the earlier AIs.
I think the ability for humans to communicate and coordinate is a double edged sword. In particular, it enables the attack vector of dangerous self propagating memes. I expect memetic warfare to play a major role in many of the failure scenarios I can think of. As we’ve seen, even humans are capable of crafting some pretty potent memes, and even defending against human actors is difficult.
I think it’s likely that the relevant reference class here is research bets rather then the “task” of AGI. An extremely successful research bet could be currently underinvested in, but once it shows promise, discontinuous (relative to the bet) amounts of resources will be dumped into scaling it up, even if the overall investment towards the task as a whole remains continuous. In other words, in this case even though investment into AGI may be continuous (though that might not even hold), discontinuity can occur on the level of specific research bets. Historical examples would include imagenet seeing discontinuous improvement with AlexNet despite continuous investment into image recognition to that point. (Also, for what it’s worth, my personal model of AI doom doesn’t depend heavily on discontinuities existing, though they do make things worse.)
I think there exist plausible alternative explanations for why capabilities has been primarily driven by compute. For instance, it may be because ML talent is extremely expensive whereas compute gets half as expensive every 18 months or whatever, that it doesn’t make economic sense to figure out compute efficient AGI. Given the fact that humans need orders of magnitude less data and compute than current models, and that the human genome isn’t that big and is mostly not cognition related, it seems plausible that we already have enough hardware for AGI if we had the textbook from the future, though I have fairly low confidence on this point.
Monolithic agents have the advantage that they’re able to reason about things that involve unlikely connections between extremely disparate fields. I would argue that the current human specialization is at least in part due to constraints about how much information one person can know. It also seems plausible that knowledge can be siloed in ways that make inference cost largely detached from the number of domains the model is competent in. Finally, people have empirically just been really excited about making giant monolithic models. Overall, it seems like there is enough incentive to make monolithic models that it’ll probably be an uphill battle to convince people not to do them.
Generally agree with the regulation point given the caveat. I do want to point out that since substantive regulation often moves very slowly, especially when there are well funded actors trying to prevent AGI development being regulated, even in non-foom scenarios (months-years) they might not move fast enough (example: think about how slowly climate change related regulations get adopted)
I hate how convincing so many different people are. I wish I just had some fairly static, reasoned perspective based on object-level facts and not persuasion strings.
Note that convincing is a 2-place word. I don’t think I can transfer this ability, but I haven’t really tried, so here’s a shot:
The target is: “reading as dialogue.” Have a world-model. As you read someone else, be simultaneously constructing / inferring “their world-model” and holding “your world-model”, noting where you agree and disagree.
If you focus too much on “how would I respond to each line”, you lose the ability to listen and figure out what they’re actually pointing at. If you focus too little on “how would I respond to this”, you lose the ability to notice disagreements, holes, and notes of discord.
The first homework exercise I’d try to printing out something (probably with double-spacing), and writing your thoughts each sentence. “uh huh”, “wait what?”, “yes and”, “no but”, etc.; at the beginning you’re probably going to be alternating between the two moves before you can do them simultaneously.
[Historically, I think I got this both from ‘reading a lot’, including a lot of old books, and also ‘arguing on the internet’ in forum environments that only sort of exist today, which was a helpful feedback loop for the relevant subskills, and of course whatever background factors made me do those activities.]
Users can’t delete their own comments if the comment has been replied to, to avoid disrupting other people’s content. (you can edit it to be blank though, or mark it as retracted)
These disagreements mainly concern the relative power of future AIs, the polarity of takeoff, takeoff speed, and, in general, the shape of future AIs. Do you also have detailed disagreements about the difficulty of alignment? If anything, the fact that the future unfolds differently in your view should impact future alignment efforts (but you also might have other considerations informing your view on alignment).
You partially answer this in the last point, saying: “But, equally, one could view these theses pessimistically.” But what do you personally think? Are you more pessimistic, more optimistic, or equally pessimistic about humanity’s chances of surviving AI progress? And why?
Part of what makes it difficult for me to talk about alignment difficultly is that the concept doesn’t fit easily into my paradigm of thinking about the future of AI. If I am correct, for example, that AI services will be modular, marginally more powerful than what comes before, and numerous as opposed to monolithic, then there will not be one alignment problem, but many.
I could talk about potential AI safety principles, healthy cultural norms, and specific engineering issues, but not “a problem” called “aligning the AI” — a soft prerequisite for explaining how difficult “the problem” will be. Put another way, my understanding is that future AI alignment will be continuous with ordinary engineering, like cars and skyscrapers. We don’t ordinarily talk about how hard the problem of building a car is, in some sort of absolute sense, though there are many ways of operationalizing what that could mean.
One question is how costly it is to build a car. We could then compare that cost to the overall consumer benefit that people get from cars, and from that, deduce whether and how many cars will be built. Similarly, we could ask about the size of the “alignment tax” (the cost of aligning an AI above the cost of building AI), and compare it to the benefits we get from aligning AI at all.
My starting point in answering this question is to first emphasize the large size of the benefits: what someone gets if they build AI correctly. We should expect this benefit to be extremely large, and thus, we should also expect people to pay very large amounts to align their AIs, including through government regulation and other social costs.
Will people still fail to align AI services, in various ways, due to the numerous issues, like e.g. mesa misalignment, outer alignment, arising from lack of oversight and transparency? Sure — and I’m uncertain by how much this will occur — but because of the points I gave in my original comment, these seem unlikely to be fatal issues, on a civilizational level. It is perhaps less analogous to nukes than to how car safety sometimes fails (though I do not want to lean heavily on this comparison, as there are real differences too).
Now, there is a real risk in misunderstanding me here. AI values and culture could drift very far from human values over time. And eventually, this could culminate in an existential risk. This is all very vague, but if I were forced to guess the probability of this happening — as in, it’s all game over and we lose as humans — I’d maybe go with 25%.
Btw, your top-level comment is one of the best comments I’ve come across ever. Probably. Top 5? Idk, I’ll check how I feel tomorrow. Aspiring to read everything you’ve ever written rn.
Incidentally, you mention that
the concept doesn’t fit easily into my paradigm of thinking about the future of AI.
And I’ve been thinking lately about how important it is to prioritise original thinking before you’ve consumed all the established literature in an active field of research.[1] If you manage to diverge early, the novelty of your perspective compounds over time (feel free to ask about my model) and you’re more likely to end up with a productively different paradigm from what’s already out there.
Did you ever feel embarrassed trying to think for yourself when you didn’t feel like you had read enough? Or, did you feel like other people might have expected you to feel embarrassed for how seriously you took your original thoughts, given how early you were in your learning arc?
I’m not saying you haven’t. I’m just guessing that you acquired your paradigm by doing original thinking early, and thus had the opportunity to diverge early, rather than greedily over-prioritising the consumption of existing literature in order to “reach the frontier”. Once having hastily consumed someone else’s paradigm, it’s much harder to find its flaws and build something else from the ground up.
The first AGIs we construct will be born into a culture already capable of coordinating, and sharing knowledge, making the potential power difference between AGI and humans relatively much smaller than between humans and other animals, at least at first.
but wouldn’t an AGI be able to coordinate and do knowledge sharing with humans because
a) it can impersonate being a human online and communicate with them via text and speech and
b) it‘ll realize such coordination is vital to accomplish it‘s goals and so it’ll do the necessary acculturation?
Watching all the episodes of Friends or reading all the social media posts by the biggest influencers, as examples.
One reason that a fully general AGI might be more profitable than specialised AIs, despite obvious gains-from-specialisation, is if profitability depends on insight-production. For humans, it’s easier to understand a particular thing the more other things you understand. One of the main ways you make novel intellectual progress is by combining remote associations from models about different things. Insight-ability for a particular novel task grows with the number of good models you have available to draw connections between.
But, it could still be that the gains from increased generalisation for a particular model grows too slowly and can’t compete with obvious gains from specialised AIs.
I think raw intelligence, while important, is not the primary factor that explains why humanity-as-a-species is much more powerful than chimpanzees-as-a-species. Notably, humans were once much less powerful, in our hunter-gatherer days, but over time, through the gradual process of accumulating technology, knowledge, and culture, humans now possess vast productive capacities that far outstrip our ancient powers.
Slightly relatedly, I think it’s possible that “causal inference is hard”. The idea is: once someone has worked something out, they can share it and people can pick it up easily, but it’s hard to figure the thing out to begin with—even with a lot of prior experience and efficient inference, most new inventions still need a lot of trial and error. Thus the reason the process of technology accumulation is gradual is, crudely, because causal inference is hard.
Even if this is true, one way things could still go badly is if most doom scenarios are locked behind a bunch of hard trial and error, but the easiest one isn’t. On the other hand, if both of these things are true then there could be meaningful safety benefits gained from censoring certain kinds of data.
This is what struck me as the least likely to be true from the above AI doom scenario.
Is diamondoid nanotechnology possible? Very likely it is or something functionally equivalent.
Can a sufficiently advanced superintelligence infer how to build it from scratch solely based on human data? Or will it need a large R&D center with many, many robotic systems that conduct experiments in parallel to extract the information required about our specific details of physics in our actual universe. Not the very slightly incorrect approximations a simulator will give you.
The ‘huge R&D center so big you can’t see the end of it’ is somewhat easier to regulate the ‘invisible dust the AI assembles with clueless stooges’.
Any individual doomsday mechanism we can think of, I would agree is not nearly so simple for an AGI to execute as Yudkowsky implies. I do think that it’s quite likely we’re just not able to think of mechanisms even theoretically that an AGI could think of, and one or more of those might actually be quite easy to do secretly and quickly. I wouldn’t call it guaranteed by any means, but intuitively this seems like the sort of thing that raw cognitive power might have a significant bearing on.
I agree. One frightening mechanism I thought of is : “ok, assume the AGI can’t craft the bioweapon or nanotechnology killbots without collecting vast amounts of information through carefully selected and performed experiments. (Basically enormous complexes full of robotics). How does it get the resources it needs?
And the answer is it scams humans into doing it. We have many examples of humans trusting someone they shouldn’t even when the evidence was readily available that they shouldn’t.
Any “huge R&D center” constraint is trivialized in a future where agile, powerful robots will be ubiquitous and an AGI can use robots to create an underground lab in the middle of nowhere, using its superintelligence to be undetectable in all ways that are physically possible. An AGI will also be able to use robots and 3D printers to fabricate purpose-built machines that enable it to conduct billions of physical experiments a day. Sure, it would be harder to construct something like a massive particle accelerator, but 1) that isn’t needed to make killer nanobots 2) even that isn’t impossible for a sufficiently intelligent machine to create covertly and quickly.
It’s as good as time as any to re-iterate my reasons for disagreeing with what I see as the Yudkowskian view of future AI. What follows isn’t intended as a rebuttal of any specific argument in this essay, but merely a pointer that I’m providing for readers, that may help explain why some people might disagree with the conclusion and reasoning contained within.
I’ll provide my cruxes point-by-point,
I think raw intelligence, while important, is not the primary factor that explains why humanity-as-a-species is much more powerful than chimpanzees-as-a-species. Notably, humans were once much less powerful, in our hunter-gatherer days, but over time, through the gradual process of accumulating technology, knowledge, and culture, humans now possess vast productive capacities that far outstrip our ancient powers.
Similarly, our ability to coordinate through language also plays a huge role in explaining our power compared to other animals. But, on a first approximation, other animals can’t coordinate at all, making this distinction much less impressive. The first AGIs we construct will be born into a culture already capable of coordinating, and sharing knowledge, making the potential power difference between AGI and humans relatively much smaller than between humans and other animals, at least at first.
Consequently, the first slightly smarter-than-human agent will probably not be able to leverage its raw intelligence to unilaterally take over the world, for pretty much the same reason that an individual human would not be able to unilaterally take over a band of chimps, in the state of nature, despite the intelligence advantage of the human.
There’s a large range of human intelligence, such that it makes sense to talk about AI slowly going from 50th percentile to 99.999th percentile on pretty much any important general intellectual task, rather than AI suddenly jumping to superhuman levels after a single major insight. In cases where progress in performance does happen rapidly, the usual reason is that there wasn’t much effort previously being put into getting better at the task.
The case of AlphaGo is instructive here: improving the SOTA on Go bots is not very profitable. We should expect, therefore, that there will be relatively few resources being put into that task, compared to the overall size of the economy. However, if a single rich company, like Google, at some point does decide to invest considerable resources into improving Go performance, then we could easily observe a discontinuity in progress. Yet, this discontinuity in output merely reflects a discontinuity in inputs, not a discontinuity as a response to small changes in those inputs, as is usually a prerequisite for foom in theoretical models.
Hardware progress and experimentation are much stronger drivers of AI progress than novel theoretical insights. The most impressive insights, like backpropagation and transformers, are probably in our past. And as the field becomes more mature, it will likely become even harder to make important theoretical discoveries.
These points make the primacy of recursive self-improvement, and as a consequence, unipolarity in AI takeoff, less likely in the future development of AI. That’s because hardware progress and AI experimentation are, for the most part, society-wide inputs, which can be contributed by a wide variety of actors, don’t exhibit strong feedback loops on an individual level, and more-or-less have smooth responses to small changes in their inputs. Absent some way of making AI far better via a small theoretical tweak, it seems that we should expect smooth, gradual progress by default, even if overall economic growth becomes very high after the invention of AGI.
[Update (June 2023): While I think these considerations are still important, I think the picture I painted in this section was misleading. I wrote about my views of AI services here.] There are strong pressures—including the principle of comparative advantage, diseconomies of scale, and gains from specialization—that incentivize making economic services narrow and modular, rather than general and all-encompassing. Illustratively, a large factory where each worker specializes in their particular role will be much more productive than a factory in which each worker is trained to be a generalist, even though no one understands any particular component of the production process very well.
What is true in human economics will apply to AI services as well. This implies we should expect something like Eric Drexler’s AI perspective, which emphasizes economic production across many agents who trade and produce narrow services, as opposed to monolithic agents that command and control.
Having seen undeniable, large economic effects from AI, policymakers will eventually realize that AGI is important, and will launch massive efforts to regulate it. The current lack of concern almost certainly reflects the fact that powerful AI hasn’t arrived yet.
There’s a long history of people regulating industries after disasters—like nuclear energy—and, given the above theses, it seems likely that there will be at least a few “warning shots” which will provide a trigger for companies and governments to crack down and invest heavily into making things go the way they want.
(Note that this does not imply any sort of optimism about the effects of these regulations, only that they will exist and will have a large effect on the trajectory of AI)
The effect of the above points is not to provide us uniform optimism about AI safety, and our collective future. It is true that, if we accept the previous theses, then many of the points in Eliezer’s list of AI lethalities become far less plausible. But, equally, one could view these theses pessimistically, by thinking that they imply the trajectory of future AI is much harder to intervene on, and do anything about, relative to the Yudkowskian view.
I basically buy the story that human intelligence is less useful that human coordination; i.e. it’s the intelligence of “humanity” the entity that matters, with the intelligence of individual humans relevant only as, like, subcomponents of that entity.
But… shouldn’t this mean you expect AGI civilization to totally dominate human civilization? They can read each other’s source code, and thus trust much more deeply! They can transmit information between them at immense bandwidths! They can clone their minds and directly learn from each other’s experiences!
Like, one scenario I visualize a lot is the NHS having a single ‘DocBot’, i.e. an artificial doctor run on datacenters that provides medical advice and decision-making for everyone in the UK (while still working with nurses and maybe surgeons and so on). Normally I focus on the way that it gets about three centuries of experience treating human patients per day, but imagine the difference in coordination capacity between DocBot and the BMA.
I think everyone expects this, and often disagree on the timescale on which it will arrive. See, for example, Elon Musk’s speech to the US National Governors Association, where he argues that the reactive regulation model will be too slow to handle the crisis.
But I think the even more important disagreement is on whether or not regulations should be expected to work. Ok, so you make it so that only corporations with large compliance departments can run AGI. How does that help? There was a tweet by Matt Yglesias a while ago that I can’t find now, which went something like: “a lot of smart people are worried about AI, and when you ask them what the government can do about it, they have no idea; this is an extremely wild situation from the perspective of a policy person.” A law that says “don’t run the bad code” is predicated on the ability to tell the good code from the bad code, which is the main thing we’re missing and don’t know how to get!
And if you say something like “ok, one major self-driving car accident will be enough to convince everyone to do the Butlerian Jihad and smash all the computers”, that’s really not how it looks to me. Like, the experience of COVID seems a lot like “people who were doing risky research in labs got out in front of everyone else to claim that the lab leak hypothesis was terrible and unscientific, and all of the anti-disinformation machinery was launched to suppress it, and it took a shockingly long time to even be able to raise the hypothesis, and it hasn’t clearly swept the field, and legislation to do something about risky research seems like it definitely isn’t a slam dunk.”
When we get some AI warning signs, I expect there are going to be people with the ability to generate pro-AI disinfo and a strong incentive to do so. I expect there to be significant latent political polarization which will tangle up any attempt to do something useful about it. I expect there won’t be anything like the international coordination that was necessary to set up anti-nuclear-proliferation efforts to set up the probably harder problem of anti-AGI-proliferation efforts.
This is 100% correct, and part of why I expect the focus on superintelligence, while literally true, is bad for AI outreach. There’s a much simpler (and empirically, in my experience, more convincing) explanation of why we lose to even an AI with an IQ of 110. It is Dath Ilan, and we are Earth. Coordination is difficult for humans and the easy part for AIs.
I will note that Eliezer wrote That Alien Message a long time ago I think in part to try to convey the issue to this perspective, but it’s mostly about “information-theoretic bounds are probably not going to be tight” in a simulation-y universe instead of “here’s what coordination between computers looks like today”. I do predict the coordination point would be good to include in more of the intro materials.
I don’t think it’s obvious that this means that AGI is more dangerous, because it means that for a fixed total impact of AGI, the AGI doesn’t have to be as competent at individual thinking (because it leans relatively more on group thinking). And so at the point where the AGIs are becoming very powerful in aggregate, this argument pushes us away from thinking they’re good at individual thinking.
Also, it’s not obvious that early AIs will actually be able to do this if their creators don’t find a way to train them to have this affordance. ML doesn’t currently normally make AIs which can helpfully share mind-states, and it probably requires non-trivial effort to hook them up correctly to be able to share mind-state.
Being able to read source code doesn’t automatically increase trust—you also have to be able to verify that the code being shared with you actually governs the AGI’s behavior, despite that AGI’s incentives and abilities to fool you.
(Conditional on the AGIs having strongly aligned goals with each other, sure, this degree of transparency would help them with pure coordination problems.)
Nice! Thanks! I’ll give my commentary on your commentary, also point by point. Your stuff italicized, my stuff not. Warning: Wall of text incoming! :)
I think raw intelligence, while important, is not the primary factor that explains why humanity-as-a-species is much more powerful than chimpanzees-as-a-species. Notably, humans were once much less powerful, in our hunter-gatherer days, but over time, through the gradual process of accumulating technology, knowledge, and culture, humans now possess vast productive capacities that far outstrip our ancient powers.
Similarly, our ability to coordinate through language also plays a huge role in explaining our power compared to other animals. But, on a first approximation, other animals can’t coordinate at all, making this distinction much less impressive. The first AGIs we construct will be born into a culture already capable of coordinating, and sharing knowledge, making the potential power difference between AGI and humans relatively much smaller than between humans and other animals, at least at first.
I don’t think I understand this argument. Yes, humans can use language to coordinate & benefit from cultural evolution, so an AI that can do that too (but is otherwise unexceptional) would have no advantage. But the possibility we are considering is that AI might be to humans what humans are to monkeys; thus, if the difference between humans and monkeys is greater intelligence allowing them to accumulate language, there might be some similarly important difference between AIs and humans. For example, language is a tool that lets humans learn from the experience of others, but AIs can literally learn from the experience of others—via the mechanism of having many copies that share weights and gradient updates! They can also e.g. graft more neurons onto an existing AI to make it smarter, think at greater serial speed, integrate calculators and other programs into their functioning and learn to use them intuitively as part of their regular thought processes… I won’t be surprised if somewhere in the grab bag of potential advantages AIs have over humans is one (or several added together) as big as the language advantage humans have over monkeys.
Plus, there’s language itself. It’s not a binary, it’s a spectrum; monkeys can use it too, to some small degree. And some humans can use it more/better than others. Perhaps AIs will (eventually, and perhaps even soon) be better at using language than the best humans.
Consequently, the first slightly smarter-than-human agent will probably not be able to leverage its raw intelligence to unilaterally take over the world, for pretty much the same reason that an individual human would not be able to unilaterally take over a band of chimps, in the state of nature, despite the intelligence advantage of the human.
Here’s how I think we should think about it. Taboo “intelligence.” Instead we just have a list of metrics a, b, c, … z, some of which are overlapping, some of which are subsets of others, etc. One of these metrics, then, is “takeover ability (intellectual component).” This metric, when combined with “takeover ability (resources),” “Takeover ability (social status)” and maybe a few others that track “exogenous” factors about how others treat the AI and what resources it has, combine together to create “overall takeover ability.”
Now, I claim, (1) Takeover is a tournament (blog post TBD, but see my writings about lessons from the conquistadors) and I cite this as support for claim (2) takeover would be easy for AIs, by which I mean, IF AIs were mildly superhuman in the intellectual component of takeover ability, they would plausibly start off with enough of the other components that they would be able to secure more of those other components fairly quickly, stay out of trouble, etc. until they could actually take over—in other words, their overall takeover ability would be mildly superhuman as well.
(I haven’t argued for this much yet but I plan to in future posts. Also I expect some people will find it obvious, and maybe you are one such person.)
Now, how should we think about AI timelines-till-human-level-takeover-ability-(intellectual)?
Same way we think about AI timelines for AGI, or TAI, or whatever. I mean obviously there are differences, but I don’t think we have reason to think that the intellectual component of takeover ability is vastly more difficult than e.g. being human-level AGI, or being able to massively accelerate world GDP, or being able to initiate recursive self-improvement or an R&D acceleration.
I mean it might be. It’s a different metric, after all. But it also might come earlier than those things. It might be easier. And I have plausibility arguments to make for that claim in fact.
So anyhow I claim: We can redo all our timelines analyses with “slightly superhuman takeover ability (intellectual)” as the thing to forecast instead of TAI or AGI or whatever, and get roughly the same numbers. And then (I claim) this is tracking when we should worry about AI takeover. Yes, by a single AI system, if only one exists; if multiple exist then by multiple.
We can hope that we’ll get really good AI alignment research assistants before we get AIs good at taking over… but that’s just a hope at this point; it totally could come in the opposite order and I have arguments that it would.
There’s a large range of human intelligence, such that it makes sense to talk about AI slowly going from 50th percentile to 99.999th percentile on pretty much any intellectual task, rather than AI suddenly jumping to superhuman levels after a single major insight. In cases where progress in performance does happen rapidly, the usual reason is that there wasn’t much effort previously being put into getting better at the task.
The case of AlphaGo is instructive here: improving the SOTA on Go bots is not very profitable. We should expect, therefore, that there will be relatively few resources being put into that task, compared to the overall size of the economy. However, if a single rich company, like Google, at some point does decide to invest considerable resources into improving Go performance, then we could easily observe a discontinuity in progress. Yet, this discontinuity in output merely reflects a discontinuity in inputs, not a discontinuity as a response to small changes in those inputs, as is usually a prerequisite for foom in theoretical models.
Hardware progress and experimentation are much stronger drivers of AI progress than novel theoretical insights. The most impressive insights, like backpropagation and transformers, are probably in our past. And as the field becomes more mature, it will likely become even harder to make important theoretical discoveries.
These points make the primacy of recursive self-improvement, and as a consequence, unipolarity in AI takeoff, less likely in the future development of AI. That’s because hardware progress and AI experimentation are, for the most part, society-wide inputs, which can be contributed by a wide variety of actors, don’t exhibit strong feedback loops on an individual level, and more-or-less have smooth responses to small changes in their inputs. Absent some way of making AI far better via a small theoretical tweak, it seems that we should expect smooth, gradual progress by default, even if overall economic growth becomes very high after the invention of AGI.
I claim this argument is a motte and bailey. The motte is the first three paragraphs, where you give good sensible reasons to think that discontinuities and massive conceptual leaps, while possible, are not typical. The bailey is the last paragraph where you suggest that we can therefore conclude unipolar takeoff is unlikely and that progress will go the way Paul Christiano thinks it’ll go instead of the way Yudkowsky thinks it’ll go. I have sat down to make toy models of what takeoff might look like, and even with zero discontinuities and five-year-spans of time to “cross the human range” the situation looks qualitatively a lot more like Yudkowsky’s story than Christiano’s. Of course you shouldn’t take my word for it, and also just because the one or two models I made looked this way doesn’t mean I’m right, maybe someone with different biases could make different models that would come out differently. But still. (Note: Part of why my models came out this way was that I was assuming stuff happens in 5-15 years from now. Paul Christiano would agree, I think, that given this assumption takeoff would be pretty fast. I haven’t tried to model what things look like on 20+ year timelines.)
There are strong pressures—including the principle of comparative advantage, diseconomies of scale, and gains from specialization—that incentivize making economic services narrow and modular, rather than general and all-encompassing. Illustratively, a large factory where each worker specializes in their particular role will be much more productive than a factory in which each worker is trained to be a generalist, even though no one understands any particular component of the production process very well.
What is true in human economics will apply to AI services as well. This implies we should expect something like Eric Drexler’s AI perspective, which emphasizes economic production across many agents who trade and produce narrow services, as opposed to monolithic agents that command and control.
This may be our biggest disagrement. Drexler’s vision of comprehensive AI services is a beautiful fantasy IMO. Agents are powerful. There will be plenty of AI services, yes, but there will also be AI agents, and those are what we are worried about. And yes it’s theoretically possible to develop the right AI services in advance to help us control the agents when they appear… but we’d best get started building them then, because they aren’t going to build themselves. And eyeballing the progress towards AI agents vs. useful interpretability tools etc., it’s not looking good.
Having seen undeniable, large economic effects from AI, policymakers will eventually realize that AGI is important, and will launch massive efforts to regulate it. The current lack of concern almost certainly reflects the fact that powerful AI hasn’t arrived yet.
There’s a long history of people regulating industries after disasters—like nuclear energy—and, given the above theses, it seems likely that there will be at least a few “warning shots” which will provide a trigger for companies and governments to crack down and invest heavily into making things go the way they want.
(Note that this does not imply any sort of optimism about the effects of these regulations, only that they will exist and will have a large effect on the trajectory of AI)
I agree in principle, but unfortunately it seems like things are going to happen fast enough (over the span of a few years at most) and soon enough (in the next decade or so, NOT in 30 years after the economy has already been transformed by narrow AI systems) that it really doesn’t seem like governments are going to do much by default. We still have the opportunity to plan ahead and get governments to do stuff! But I think if we sit on our asses, nothing of use will happen. (Probably there will be some regulation but it’ll be irrelevant like most regulation is.)
In particular I think that we won’t get any cool exciting scary AI takeover near-misses that cause massive crackdowns on the creation of AIs that could possibly take over, the way we did for nuclear power plants. Why would we? The jargon for this is “Sordid Stumble before Treacherous Turn.” It might happen but we shouldn’t expect it by default I think. Yes, before AIs are smart enough to take over, they will be dumber. But what matters is: Before an AI is smart enough to take over and smart enough to realize this, will there be an AI that can’t take over but thinks it can? And “before” can’t be “two weeks before” either, it probably needs to be more like two months or two years, otherwise the dastardly plan won’t have time to go awry and be caught and argued about and then regulated against. Also the AI in question has to be scarily smart otherwise it’s takeover attempt will fail so early that it won’t be registered as such, it’ll be like GPT-3 lying to users to get reward or Facebook’s recommendation algorithm causing thousands of teenage girls to kill themselves, people will be like “Oh yes this was an error, good thing we train that sort of thing away, see look how the system behaves better now.”
The effect of the above points is not to provide us uniform optimism about AI safety, and our collective future. It is true that, if we accept the previous theses, then many of the points in Eliezer’s list of AI lethalities become far less plausible. But, equally, one could view these theses pessimistically, by thinking that they imply the trajectory of future AI is much harder to intervene on, and do anything about, relative to the Yudkowskian view.
I haven’t gone through the list point by point, I won’t comment on this then. I agree that longer timelines slow takeoff worlds we have less influence over relative to other humans.
“I have sat down to make toy models ..”
reference?
? I am the reference, I’m describing a personal experience.
I meant, is there a link to where you’ve written this down somewhere? Maybe you just haven’t written it down.
I’ll send you a DM.
Markdown has syntax for quotes: a line with
> this
on it will look likeCan I get a link or two to read more about this incident?
It’s not so much an incident as a trend. I haven’t investigated it myself, but I’ve read lots of people making this claim, citing various studies, etc. See e.g. “The social dilemma” by Tristan Harris.
There’s an academic literature on the subject now which I haven’t read but which you can probably find by googling.
I just did a quick search and found graphs like this:
Presumably not all of the increase in deaths is due to Facebook; presumably it’s multi-causal blah blah blah. But even if Facebook is responsible for a tiny fraction of the increase, that would mean Facebook was responsible for thousands of deaths.
You said you weren’t replying to any specific point Eliezer was making, but I think it’s worth pointing out that when he brings up Alpha Go, he’s not talking about the 2 years it took Google to build a Go-playing AI—remarkable and surprising as that was—but rather the 3 days it took Alpha Zero to go from not knowing anything about the game beyond the basic rules to being better than all humans and the earlier AIs.
Some quick thoughts on these points:
I think the ability for humans to communicate and coordinate is a double edged sword. In particular, it enables the attack vector of dangerous self propagating memes. I expect memetic warfare to play a major role in many of the failure scenarios I can think of. As we’ve seen, even humans are capable of crafting some pretty potent memes, and even defending against human actors is difficult.
I think it’s likely that the relevant reference class here is research bets rather then the “task” of AGI. An extremely successful research bet could be currently underinvested in, but once it shows promise, discontinuous (relative to the bet) amounts of resources will be dumped into scaling it up, even if the overall investment towards the task as a whole remains continuous. In other words, in this case even though investment into AGI may be continuous (though that might not even hold), discontinuity can occur on the level of specific research bets. Historical examples would include imagenet seeing discontinuous improvement with AlexNet despite continuous investment into image recognition to that point. (Also, for what it’s worth, my personal model of AI doom doesn’t depend heavily on discontinuities existing, though they do make things worse.)
I think there exist plausible alternative explanations for why capabilities has been primarily driven by compute. For instance, it may be because ML talent is extremely expensive whereas compute gets half as expensive every 18 months or whatever, that it doesn’t make economic sense to figure out compute efficient AGI. Given the fact that humans need orders of magnitude less data and compute than current models, and that the human genome isn’t that big and is mostly not cognition related, it seems plausible that we already have enough hardware for AGI if we had the textbook from the future, though I have fairly low confidence on this point.
Monolithic agents have the advantage that they’re able to reason about things that involve unlikely connections between extremely disparate fields. I would argue that the current human specialization is at least in part due to constraints about how much information one person can know. It also seems plausible that knowledge can be siloed in ways that make inference cost largely detached from the number of domains the model is competent in. Finally, people have empirically just been really excited about making giant monolithic models. Overall, it seems like there is enough incentive to make monolithic models that it’ll probably be an uphill battle to convince people not to do them.
Generally agree with the regulation point given the caveat. I do want to point out that since substantive regulation often moves very slowly, especially when there are well funded actors trying to prevent AGI development being regulated, even in non-foom scenarios (months-years) they might not move fast enough (example: think about how slowly climate change related regulations get adopted)
I hate how convincing so many different people are. I wish I just had some fairly static, reasoned perspective based on object-level facts and not persuasion strings.
Note that convincing is a 2-place word. I don’t think I can transfer this ability, but I haven’t really tried, so here’s a shot:
The target is: “reading as dialogue.” Have a world-model. As you read someone else, be simultaneously constructing / inferring “their world-model” and holding “your world-model”, noting where you agree and disagree.
If you focus too much on “how would I respond to each line”, you lose the ability to listen and figure out what they’re actually pointing at. If you focus too little on “how would I respond to this”, you lose the ability to notice disagreements, holes, and notes of discord.
The first homework exercise I’d try to printing out something (probably with double-spacing), and writing your thoughts each sentence. “uh huh”, “wait what?”, “yes and”, “no but”, etc.; at the beginning you’re probably going to be alternating between the two moves before you can do them simultaneously.
[Historically, I think I got this both from ‘reading a lot’, including a lot of old books, and also ‘arguing on the internet’ in forum environments that only sort of exist today, which was a helpful feedback loop for the relevant subskills, and of course whatever background factors made me do those activities.]
Why can’t I delete comments sometimes? >:(
Users can’t delete their own comments if the comment has been replied to, to avoid disrupting other people’s content. (you can edit it to be blank though, or mark it as retracted)
Thanks a lot for writing this.
These disagreements mainly concern the relative power of future AIs, the polarity of takeoff, takeoff speed, and, in general, the shape of future AIs. Do you also have detailed disagreements about the difficulty of alignment? If anything, the fact that the future unfolds differently in your view should impact future alignment efforts (but you also might have other considerations informing your view on alignment).
You partially answer this in the last point, saying: “But, equally, one could view these theses pessimistically.” But what do you personally think? Are you more pessimistic, more optimistic, or equally pessimistic about humanity’s chances of surviving AI progress? And why?
Part of what makes it difficult for me to talk about alignment difficultly is that the concept doesn’t fit easily into my paradigm of thinking about the future of AI. If I am correct, for example, that AI services will be modular, marginally more powerful than what comes before, and numerous as opposed to monolithic, then there will not be one alignment problem, but many.
I could talk about potential AI safety principles, healthy cultural norms, and specific engineering issues, but not “a problem” called “aligning the AI” — a soft prerequisite for explaining how difficult “the problem” will be. Put another way, my understanding is that future AI alignment will be continuous with ordinary engineering, like cars and skyscrapers. We don’t ordinarily talk about how hard the problem of building a car is, in some sort of absolute sense, though there are many ways of operationalizing what that could mean.
One question is how costly it is to build a car. We could then compare that cost to the overall consumer benefit that people get from cars, and from that, deduce whether and how many cars will be built. Similarly, we could ask about the size of the “alignment tax” (the cost of aligning an AI above the cost of building AI), and compare it to the benefits we get from aligning AI at all.
My starting point in answering this question is to first emphasize the large size of the benefits: what someone gets if they build AI correctly. We should expect this benefit to be extremely large, and thus, we should also expect people to pay very large amounts to align their AIs, including through government regulation and other social costs.
Will people still fail to align AI services, in various ways, due to the numerous issues, like e.g. mesa misalignment, outer alignment, arising from lack of oversight and transparency? Sure — and I’m uncertain by how much this will occur — but because of the points I gave in my original comment, these seem unlikely to be fatal issues, on a civilizational level. It is perhaps less analogous to nukes than to how car safety sometimes fails (though I do not want to lean heavily on this comparison, as there are real differences too).
Now, there is a real risk in misunderstanding me here. AI values and culture could drift very far from human values over time. And eventually, this could culminate in an existential risk. This is all very vague, but if I were forced to guess the probability of this happening — as in, it’s all game over and we lose as humans — I’d maybe go with 25%.
Btw, your top-level comment is one of the best comments I’ve come across ever. Probably. Top 5? Idk, I’ll check how I feel tomorrow. Aspiring to read everything you’ve ever written rn.
Incidentally, you mention that
And I’ve been thinking lately about how important it is to prioritise original thinking before you’ve consumed all the established literature in an active field of research.[1] If you manage to diverge early, the novelty of your perspective compounds over time (feel free to ask about my model) and you’re more likely to end up with a productively different paradigm from what’s already out there.
Did you ever feel embarrassed trying to think for yourself when you didn’t feel like you had read enough? Or, did you feel like other people might have expected you to feel embarrassed for how seriously you took your original thoughts, given how early you were in your learning arc?
I’m not saying you haven’t. I’m just guessing that you acquired your paradigm by doing original thinking early, and thus had the opportunity to diverge early, rather than greedily over-prioritising the consumption of existing literature in order to “reach the frontier”. Once having hastily consumed someone else’s paradigm, it’s much harder to find its flaws and build something else from the ground up.
hi Matt! on the coordination crux, you say
but wouldn’t an AGI be able to coordinate and do knowledge sharing with humans because
a) it can impersonate being a human online and communicate with them via text and speech and
b) it‘ll realize such coordination is vital to accomplish it‘s goals and so it’ll do the necessary acculturation?
Watching all the episodes of Friends or reading all the social media posts by the biggest influencers, as examples.
One reason that a fully general AGI might be more profitable than specialised AIs, despite obvious gains-from-specialisation, is if profitability depends on insight-production. For humans, it’s easier to understand a particular thing the more other things you understand. One of the main ways you make novel intellectual progress is by combining remote associations from models about different things. Insight-ability for a particular novel task grows with the number of good models you have available to draw connections between.
But, it could still be that the gains from increased generalisation for a particular model grows too slowly and can’t compete with obvious gains from specialised AIs.
Slightly relatedly, I think it’s possible that “causal inference is hard”. The idea is: once someone has worked something out, they can share it and people can pick it up easily, but it’s hard to figure the thing out to begin with—even with a lot of prior experience and efficient inference, most new inventions still need a lot of trial and error. Thus the reason the process of technology accumulation is gradual is, crudely, because causal inference is hard.
Even if this is true, one way things could still go badly is if most doom scenarios are locked behind a bunch of hard trial and error, but the easiest one isn’t. On the other hand, if both of these things are true then there could be meaningful safety benefits gained from censoring certain kinds of data.
This is what struck me as the least likely to be true from the above AI doom scenario.
Is diamondoid nanotechnology possible? Very likely it is or something functionally equivalent.
Can a sufficiently advanced superintelligence infer how to build it from scratch solely based on human data? Or will it need a large R&D center with many, many robotic systems that conduct experiments in parallel to extract the information required about our specific details of physics in our actual universe. Not the very slightly incorrect approximations a simulator will give you.
The ‘huge R&D center so big you can’t see the end of it’ is somewhat easier to regulate the ‘invisible dust the AI assembles with clueless stooges’.
Any individual doomsday mechanism we can think of, I would agree is not nearly so simple for an AGI to execute as Yudkowsky implies. I do think that it’s quite likely we’re just not able to think of mechanisms even theoretically that an AGI could think of, and one or more of those might actually be quite easy to do secretly and quickly. I wouldn’t call it guaranteed by any means, but intuitively this seems like the sort of thing that raw cognitive power might have a significant bearing on.
I agree. One frightening mechanism I thought of is : “ok, assume the AGI can’t craft the bioweapon or nanotechnology killbots without collecting vast amounts of information through carefully selected and performed experiments. (Basically enormous complexes full of robotics). How does it get the resources it needs?
And the answer is it scams humans into doing it. We have many examples of humans trusting someone they shouldn’t even when the evidence was readily available that they shouldn’t.
Any “huge R&D center” constraint is trivialized in a future where agile, powerful robots will be ubiquitous and an AGI can use robots to create an underground lab in the middle of nowhere, using its superintelligence to be undetectable in all ways that are physically possible. An AGI will also be able to use robots and 3D printers to fabricate purpose-built machines that enable it to conduct billions of physical experiments a day. Sure, it would be harder to construct something like a massive particle accelerator, but 1) that isn’t needed to make killer nanobots 2) even that isn’t impossible for a sufficiently intelligent machine to create covertly and quickly.