My Current Thoughts on the AI Strategic Landscape
I started working at AI Impacts slightly less than a year ago. Before then, I was not following developments in either AI or AI safety. I do not consider myself a rationalist and did not engage with LessWrong before starting this job. While I have mostly been working on historical case studies,[1] I have gotten a close look at the AI safety community and the arguments therein. I live in a rationalist group house and work out of an AI safety office. I think I have about as informed an opinion on AI safety as is possible without doing a bunch of technical alignment research or being involved in the community for years.
Here are my current opinions on AI safety. Some of them may be wrong: I endorse being wrong more often if the alternative is not saying things of consequence.
This is presented as an organized list of my thoughts. There are arguments in my head justifying most of these, but I will not be spelling them out in detail here. I will link to more detailed arguments when they are available. If something is in italics, then I wrote the argument at that link, or intend to write about it in the future.
This should be readable at any level of the list. If you want a quick overview, you can just read the top level points, in bold. Or you can read some details, but not others. Or you can read everything.
I am mostly unconvinced by the classic story of AI risk.
Currently, AI is not very significant in the global economy / human society. In order to become impactful, the capabilities of AI will have to increase.
AI capabilities are increasing rapidly now.
It is not clear how much AI capabilities will need to increase in order for AI to become very impactful.
Several ways of operationalizing extremely capable AI include whole brain emulation (WBE), artificial general intelligence (AGI), and transformative artificial intelligence (TAI).
Whole brain emulation seems impossible on a (classical) computer.
Neurons / synapses are not logic gates. They have complicated internal dynamics.
Computer engineers put a lot of effort into making sure that individual bits are either 0 or 1 and are independent of any microscopic randomness. Brains do not have this. There is the possibility of chaotic—and also non-chaotic—behavior at any scale. There is no non-chaotic mesoscale preventing unpredictable and uncontrollable microscopic fluctuations from influencing relevant macroscopic behavior.
There are some examples of inherently quantum effects impacting the macroscopic behavior of some biological systems. These are sometimes not things where the effects can be fully captured by setting some macroscopic parameters.
I am sympathetic to quantum theories of the mind. I don’t think any one that currently exists is satisfying or has solid evidence, but I’m glad that there is work in this direction.
We have not gotten whole brain emulation to work for C. elegans, even though it has only 302 neurons.
It feels like there is some assumption that information theory solves the hard problem of consciousness, or maybe something more vague about what it means to be a person.
The thinking here often feels implicit or unclear, so I’m not sure exactly how to operationalize the disagreement, but it feels like something is very wrong here.
AGI seems more plausible than whole brain emulation, but still extraordinarily difficult or impossible.
I’m using the Annie Oakley definition of AGI: Anything You Can Do I Can Do Better.
Being able to exactly model a human would be sufficient for AGI, but I do not think that a complicated indeterministic system (like humans) can be well modeled by a complicated deterministic system plus a simple random number generator.
AGI is more difficult than being superhuman at every well-specified task because humans can do, and create, things which are not well-specified tasks.
Here are some things that humans do that I think are particularly hard for AI:
Choosing your own end goals.
It’s not clear that we would want powerful AI to have this ability.
A particular notion of creativity that involves increasing the dimension of the space of possible options, as opposed to recombining existing options.[2]
Human abilities do not just include a list of effectiveness at some particular skills—they also include the ability to create new skills.
Responding well to completely novel situations.
In particular, recursive self-improvement feels pretty unlikely, especially if people don’t actively try to make something that recursively self-improves.
TAI seems much more plausible than AGI, but still very difficult.
By ‘TAI’, I mean ‘AI that could pose an existential threat’.
Killing all humans seems very hard, but easier than replicating everything humans can or might do.
I would guess that the most important threats involve interactions with other potential x-risks, like hacking nuclear arsenals or engineering viruses.
I think that the threat from highly agentic systems is overrated, the threat from misuse is slightly underrated, and the threat from accidents should be highest rated.[3]
Aligning powerful AI does seem to be pretty hard, but not completely hopeless.
I don’t have any particular expertise here, so I’m deferring to what seems to be a moderate position in the AI safety community:
We don’t know how to align current systems.
We don’t know how to tell if a system is aligned.
We don’t know if alignment techniques will scale better or worse than capabilities.
Strong optimization seems generically dangerous and hard to predict or control.
But maybe we don’t need to get exact human values. Close is good enough.
Being able to iterate with near-human-level AI systems would be valuable, and seems likely for any particular skill (see below).
There are multiple lines of research which people are following that might bear fruit. As more people work on AI alignment, new strategies will be developed.
This feels like it depends on the relative rate of progress in alignment and in capabilities.
Partially, this will be due to technical details of how hard the problems are.
This is also something we can influence, by getting more people to do alignment research or by slowing progress in AI capabilities.
Getting alignment right is less important if AI ends up being less powerful.
The time to cross the human range seems likely to be large for any particular skill.
The range of human ability seems like it spans at least an order of magnitude for many important skills.
This is probably measurably true for some tasks: vocabulary (or number of languages) known, speed of doing arithmetic, writing speed, sales commissions, speed of learning to play a song.
This feels even more true for some immeasurable things: groundbreaking scientific research, producing great art/music/literature, impacting people’s lives.
- Discontinuities in technological progress exist, but they are uncommon.
Especially recent, large discontinuities.
For the tasks for which computers are currently superhuman, it took decades to cross the human range.
Technological progress stops sometimes.
It shouldn’t be too surprising if AI capabilities development abruptly hits a wall and stops progressing for a few decades.
One particular way this might occur: When can training something to mimic human behavior allow it to become superhuman at that behavior?
Other possibilities include the end of Moore’s Law and running out of human-generated data to train on.
Also unknown unknowns.
It is almost guaranteed that we will encounter problems we aren’t currently considering. What is unclear is how insurmountable these future problems will be.
When you’re trying to do something hard, the effect of unknown unknowns is not symmetric. You’re more likely to encounter an unknown unknown that hinders you than helps you.
AI is probably less impactful than people expect.
We could try to measure how influential a particular industry is on the rest of the economy/society.
If a new technology from a particular industry increases that industry’s efficiency by some percent, how much does that cause overall productivity to increase or people’s quality of life to improve?
The answer could be thought of as a multiplier.
We could rank industries by this multiplier.
My guess is that energy has the largest multiplier.
It seems like, so far, information technology has an extremely low multiplier.
Even extremely widespread adoption of a new technology in this industry does not change productivity or quality of life that much.
Instead, it shows up everywhere but the productivity statistics.
This would suggest that people tend to significantly overestimate the importance of advances in IT generally.
Maybe AI will be different, but this should lower our expectations of its impact.
Governance doesn’t feel that hard.
Creating really good regulations, which facilitate all good research and prevent all dangerous research, is hard because no one knows how to distinguish good from dangerous research. If the goal is to slow or stop AI development, that is much more tractable.
Many scary-seeming technologies are heavily regulated.
I don’t quite want to say that this is the default option, but it should not be at all surprising.
It feels like more industries are over-regulated than under-regulated.
Software has an unusually small amount of regulation.
Advocacy was often necessary to create the regulation, but the advocacy wasn’t that hard and it was often successful
Public surveys suggest that many/most people are open to more regulation of AI.
Countries copy each other’s policies a lot, even when there’s little international pressure to do so.
A lot of mimicry happens with little or no effort from major actors.
Examples include medical ethics and geoengineering.
Also anecdotal evidence from people who have worked in government in small countries.
Coordinating the whole world is probably not much harder than coordinating 2-3 countries.
We already have laws that prevent people from building some weapons systems or related technologies.
Enforcement mechanisms are feasible and not dystopian.
Leading AI systems currently require a lot of computing power. If this changes, and dangerous AI can be built on a laptop, then enforcement becomes harder.
Regulation can shift techno-hype towards other technologies, which makes future regulation easier.
My p(Doom) is in the low single digits. Maybe 3%? This is a gut feeling probability, rather than a Fermi estimate.
The view from within these arguments feels even lower. I’m discounting some because of uncertainty whether these arguments are valid.
The central vision of the future I expect is that AI continues to do some impressive things and becomes more widespread, but it does not have much material impact on most people’s lives. I think that AI progress completely stopping and AI progress actually affecting the productivity statistics as similarly plausible.
I nevertheless support slowing AI development.
Even 1% is an extremely high chance of death.[4]
If you think that there is a 1% chance of AI killing everyone, that means that, in expectation, AI kills 80 million people.
This is about as bad as Stalin & Mao combined.
COVID-19 also had about a 1% risk of death. We were willing to temporarily shut down large sections of the global economy to respond to this threat.
Pausing AI development for 6 months would be like a COVID shutdown, except only directed at one small corner of one industry.
This is a much smaller cost to respond to a similar scale, although differently distributed, risk.
In expectation is probably not the best way of thinking about this.
We don’t live in the in expectation world.
Humanity can definitely recover from losing 1% of the population. Existential risk is worse than a similar scale risk for each person independently.
You don’t bet the farm on an uncertain prospect. High variance, high expectation bets are worse than low variance, moderate expectation bets if survival is at stake.
Other technologies which have significant x-risk should not be pursued, or only developed extremely cautiously.
Most technologies’ x-risk is many orders of magnitude lower than 1%. Things with comparable x-risk are considered to be really dangerous.
Toby Ord estimates that nuclear weapons have an x-risk of about 1/1000.
I support nuclear non-proliferation, reducing the size of nuclear stockpiles, and strong norms against the use of nuclear weapons.
Toby Ord estimates that climate change has an x-risk of about 1/1000.
I support transitioning to conventional nuclear power, using hydro, wind, & solar where they’re most effective, and developing new power sources like fusion & deep geothermal.
Toby Ord estimates that the x-risk from new diseases is about 1⁄30.
I support bans on biological weapons, stopping gain of function research, faster vaccine research & approval mechanisms, better pandemic detection & monitoring systems, and maintaining latent capacity in our health care & medical manufacturing industries.
Part of my motivation is due to non-existential problems these threats pose, but x-risk itself is also really bad.
AI should be treated similarly to other technologies with a 0.1% − 10% x-risk.
This means strict regulation, slow & cautious development, and considering outright bans.
The reasons I think AI x-risk is unlikely also argue against Our Glorious Future coming from AGI, so I expect that there is less to be gained by not slowing AI.
Some people claim that the benefits of AGI are so large that they outweigh the x-risk.
This is not how you should think about risk when survival is on the line.
Don’t bet the farm.
Some people are more worried about their personal death than the survival of humanity.[5]
This is pure selfishness.
This may feel persuasive to you, but no one else should be persuaded by your selfishness.
If whole brain emulation is harder than TAI, then we would face x-risk before we get digital immortality via mind uploading.
Mind uploading might not even be a thing because of the hard problem of consciousness.
There’s a lot of other exciting technologies people could be working towards instead.
Working on AI might not be the best way to advance technology in the ways you care about.
It might be better to work on the problem you care about directly than to try to route through a potentially fully general problem solving tool that has significant x-risk.
If you want to advance public health, do you work on mRNA vaccines or on AI?
If you want to colonize space, do you work on rockets or on AI?
If you want better sources of energy, do you work on batteries or on AI?
AI might not end up being a feasible way to solve these problems.
There is opportunity cost when people decide to work on AI.
Elon Musk has a space colonization company. Sam Altman has a fusion company. They could be advancing technology in more concrete ways instead of trying to create something that poses an existential threat.
CEOs are the most famous, but this is also true when people reskill in order to work on AI or when students decide that they want their career to be in AI.
More people should be thinking of atoms, not bits, in order to improve the future.
Regulation now can steer the hype train.
Technological development is path dependent, so shifting focus can have long-term consequences.
When some technologies are favored and some disfavored by regulation, it signals what the next big technology will or won’t be and encourages or discourages research in that direction.
For example, compare the trends in cost of nuclear power vs. renewables.
The time of perils doesn’t have to be ended by AI.
Space colonization would dramatically reduce x-risk.
It’s not clear to me that ‘time of perils’ is even a good way to understand our technological past, present, and future.
Even if this is a good framing, it’s not clear whether we should try to end the time of perils as quickly as possible by building AGI now, or whether it’s better to make this time less perilous by being more cautious.
If most of the risk is not from AI, and we were convinced AGI would solve the problems, then building AGI quickly would make sense.
That doesn’t seem to be the world we live in.
Regulation designed to slow AI might not even slow technological progress overall.
Many people currently working on AI would work on something else exciting.
Progress in IT has been less impactful on overall productivity and quality of life than progress in other fields.
But there are fixed costs as people transition and some people’s jobs will be less well matched to their skills.
The market should sort this out, unless people systematically overestimate the value of AI, or IT more generally. I think that they do.
Even if there is a net cost to technological progress, it could be worth it to reduce x-risk.
I don’t trust leading AI companies to be responsible.
Explicit statements that they’re racing.
Completely dismissing AI safety concerns.
Although others seem to be taking AI safety seriously.
I have a really high standard for ‘responsible’ when potential x-risks are involved.
Other people and groups in the AI community are much less responsible.
Purposefully building dangerous versions of things to get people scared.
Examples: very unaligned, agentic, weapons design.
Presumably there are plans to stop doing this before things are actually dangerous. Do they know when this is?
Even if I think something is probably impossible, that doesn’t mean I want someone to be actively trying to build it.
I don’t know of any other industry where this is a thing.
There are tests to failure, like crash tests, and red teaming to find existing flaws, but not redesigns to make things more dangerous.
Tullock’s spike is a thought experiment, not something car companies make.
While leading models are not open source, it does not seem to take very long for similar capabilities to become open source.
Leading labs sometimes provide resources for other people to make open source models.
I don’t know how hardened AI labs are against leaks.
I am sympathetic to “information wants to be free” arguments for most things, but not if it’s a potential x-risk.
We shouldn’t want nuclear weapons designs or virus engineering to be open source either.
There exist people who build dangerous things for the LOLs.
Many leading AI companies have close relationships with the AI safety community. But it also feels like they have close relationships with communities pulling them in less responsible directions.
Even if we figure out how to build AGI, solve the technical alignment problem, and leading AI labs follow the AI safety community, I’m not convinced that things will end well.
This isn’t the future I expect, but many people reading this probably hope for this future, so it is worth addressing.
I don’t think that the AI safety community is hardened against takeover.
If the AI safety community did gain significant power, various ideological groups or selfish individuals would try to gain influence over it.
It is common for political revolutions to be hijacked by a populist dictator who outmaneuvers the original leaders.
This is not a great analogy, but it is concerning.
It feels like someone who learns the shibboleths and offers money or influence could help make strategic decisions in the community.
Sam Bankman-Fried. Hopefully this has improved since the FTX collapse.
It feels like many people in the AI safety community have a limited understanding of evil.
This is great if it means that many people haven’t had to interact with terrible people.
But it is a potential source of naivety / vulnerability.
There isn’t a vision of Our Glorious Future with widespread support among the AI safety community.
The AI safety community attracts futurists of many types. The community feels simultaneously highly non-representative of society and not unified in a vision of the future.
Some people claim that superintelligence aligned to anyone’s values is good enough.
It feels like this attitude underestimates how terrible some ideologies that people endorse are.
This seems better than everyone dying from unaligned TAI, but not than continued technological progress without TAI.
The utopian dreams of many leaders of the community are often unspecified (or unconsidered?).
If they were laid out, they would likely see significant debate within, and from outside, the community.
This is a potential source of conflict in the future.
Some important unresolved questions:
Are we trying to protect humanity or usher in transhumanism?
How seriously should we take the potential rights of AI systems themselves?
How should we weigh the needs of people who currently exist, compared to the needs of potentially large numbers of people who might exist in the future?
Is there continuity of self? How does this impact existing people’s rights?
How much democratic control should there be? Or is it preferable for the most ‘rational’ people to have more control?
Should we be willing to seize significant power over the world to prevent current or future x-risks? What pivotal acts are or are not acceptable to take?
Is value lock-in the goal of alignment or something bad?
What costs should we be willing to pay to protect, or avoid harming, people who don’t want to take part in Our Glorious Future?
How much x-risk is worth tolerating to achieve Our Glorious Future? How much suffering, or restrictions on freedom, or other more mundane costs are worth tolerating to achieve Our Glorious Future?
Even if we don’t want the AI safety community to be the ones answering these questions, I would like the community to make their preferred answers more public.
It is useful to start working on problems before you have to solve them.
Having well-specified hopes gives us a standard to judge community leaders by in the future and makes takeover harder.
In order for the public to meaningfully engage with the AI safety community, it helps them to know the community’s goals.
Not telling the public what we hope to achieve with powerful AI gives them less ability to contribute to answering these questions.
If there were a vote or something where the community determined its answers to these and other questions, I would not be surprised if I strongly disagreed with the results.
This is not because I know that I would answer them correctly. If future AI systems are anywhere near as powerful as many people in the AI Safety community think they would be, then I do not trust anyone with that much power.
I don’t think that AI will be that powerful, but that does not mean that we should be trying to build it.
I think that AI doom is unlikely (but not extremely unlikely), and that we should be trying to slow or stop AI development. Most people who have higher p(Doom) than me seem wrong about the world, particularly the nature of intelligence, agency, or what it means to be human. Most people who are less willing to slow AI than me seem wrong about how to act, particularly when dealing with an x-risk. Many people in the AI safety community both have higher p(Doom) and are less willing to slow AI than me. This seems bad.
My guess is that this is mostly selection bias. The consensus view on LessWrong had been[6] that AI is very dangerous, but we shouldn’t try to slow or stop it. This has attracted people to the AI safety community who believe this. The taboo on advocating for slowing or stopping AI has only been broken within the last year or two. Already, the community has shifted significantly in this direction.
My guess is that as more people become familiar with AI safety, they’ll mostly end up close to my position:[7] skeptical, but concerned. This should make governance easier: people do not have to be completely sold on the whole AI x-risk argument to be willing to regulate, slow, or even stop the development of potentially dangerous AI development.
- ^
Which is why AI Impacts hired me.
- ^
I don’t intend to engage much about this point until I’ve written more on this.
- ^
It’s not clear to me whether this is underrated or appropriately rated. There’s a lot of work being done on particular problems, most of which I am not familiar with.
- ^
I said 3% before, but I don’t think that the rest of the argument changes much as long as it is low single digits. 1% is more memorable.
- ^
This is partially based on personal conversations I’ve had and partially based on other people who have been surprised by how common this position is.
- ^
As of 2021.
- ^
This is guaranteed by the typical mind fallacy.
We have a difference of perspective here which I’m struggling to articulate (I’m resisting the urge to say “Go read the Sequences!”) Human minds do lots of neat things like invent quantum mechanics. These things don’t just randomly fall out of an inscrutable Rube Goldberg mechanism. Brains do these kinds of things because they run algorithms designed to do these kinds of things. “Chaos” is not a useful, load-bearing ingredient in any algorithm. Like, AI researchers have invented many algorithms in the past, and they do lots of neat things like play superhuman Go and write code. Granted, so far, none of those algorithms do all the things that brains can do, and thus AI researchers continue to brainstorm new and different algorithms. But what these researchers are not saying is “Gee, you know what’s missing? Chaos. We need more chaos in our algorithms. That will get us into NeurIPS for sure.” Or “Gee, we need more complexity in our algorithm.” Right? The chaos or complexity might happen as a side-effect of other things, but they’re not why the algorithm works; they’re not part of the engine that extracts improbable good ideas and correct understanding and effective plans from the complexity of the world.
If it helps, here’s something I wrote a while ago:
See also here and here maybe.
If by ‘algorithm’, you mean thing-that-does-a-thing, then I think I agree. If by ‘algorithm’, you mean thing-that-can-be-implemented-in-python, then I disagree.
Perhaps a good analogy comes from quantum computing.* Shor’s algorithm is not implementable on a classical computer. It can be approximated by a classical computer, at very high cost. Qubits are not bits, or combinations of bits. They have different underlying dynamics, which makes quantum computers importantly distinct from classical computers.
The claim is that the brain is also built out of things which are dynamically distinct from bits. ‘Chaos’ here is being used in the modern technical sense, not in the ancient Greek sense to mean ‘formless matter’. Low dimensional chaotic systems can be approximated on a classical computer, although this gets harder as the dimensionality increases. Maybe this grounds out in some simple mesoscopic classical system, which can be easily modeled with bits, but it seems likely to me that it grounds out in a quantum system, which cannot.
* I’m not an expert in quantum computer, so I’m not super confident in this analogy.
Different kinds of computers have different operations that are fast versus slow.
On a CPU, performing 1,000,000 inevitably-serial floating point multiplications is insanely fast, whereas multiplying 10,000×10,000 floating-point matrices is rather slow. On a GPU, it’s the reverse.
By the same token, there are certain low-level operations that are far faster on quantum computers than classical computers, and vice-versa. In regards to Shor’s algorithm, of course you can compute discrete logs on classical computers, it just takes exponentially longer than with quantum computers (at least with currently-known algorithms), because quantum computers happen to have an affordance for certain fast low-level operations that lead to calculations of the discrete log.
So anyway, it’s coherent to say that:
Maybe there is some subproblem which is extremely helpful for human-like intelligence, in the same way that calculating discrete logs is extremely helpful for factoring large numbers.
Maybe neurons and collections of neurons have particular affordances which enable blazingly-fast low-level possibly-analog solution of that subproblem. Like, maybe the dynamics of membrane proteins just happens to line up with the thing you need to do in order to approximate the solution to some funny database query thing, or whatever.
…and therefore, maybe brains can do things that would require some insanely large amount of computer chips to do.
…But I don’t think there’s any reason to believe that, and it strikes me as very implausible.
Hmm, I guess I get the impression from you of a general lack of curiosity about what’s going on here under the hood. Like, exactly what kinds of algorithmic subproblems might come up if you were building a human-like intelligence from scratch? And exactly what kind of fast low-level affordances are enabled by collections of neurons, that are not emulate-able by the fast low-level affordances of chips? Do we expect those two sets to overlap or not? Those are the kinds of questions that I’m thinking about. Whereas the vibe I’m getting from your writing—and I could be wrong—is “Human intelligence is complicated, and neurons are complicated, so maybe the latter causes the former, shrug”.
Also, in regards to Shor’s algorithm, long before quantum computers existed, we already knew how to calculate discrete logs, and we already knew that doing so would allow us to factor big numbers. It was just annoyingly slow. By contrast, I do not believe that we already know how to make a superintelligent agent, and we just don’t do it because our chips would do it very slowly. Do you agree? If so, then the thing we’re missing is not “Our chips have a different set of fast low-level affordances than do neurons, and the neuron’s set is better suited to the calculations that we need than the chips’ set.”. Right?
The impression of incuriosity is probably just because I collapsed my thoughts into a few bullet points.
The causal link between human intelligence and neurons is not just because they’re both complicated. My thought process here is something more like:
All instances of human intelligence we are familiar with are associated with a brain.
Brains are built out of neurons.
Neurons’ dynamics looks very different from the dynamics of bits.
Maybe these differences are important for some of the things brains can do.
It feels pretty plausible that the underlying architecture of brains is important for at least some of the things brains can do. Maybe we will see multiple realizability where similar intelligence can be either built on a brain or on a computer. But we have not (yet?) seen that, even for extremely simple brains.
I think both that we do not know how to build a superintelligence and that if we knew how to model neurons, silicon chips would run it extremely slowly. Both things are missing.
This seems very reasonable to me, but I think it’s easy to get the impression from your writing that you think it’s very likely that:
The differences in dynamics between neurons and bits are important for the things brains do
The relevant differences will cause anything that does what brains do to be subject to the chaos-related difficulties of simulating a brain at a very low level.
I think Steven has done a good job of trying to identify a bit more specifically what it might look like for these differences in dynamics to matter. I think your case might be stronger if you had a bit more of an object level description of what, specifically, is going on in brains that’s relevant to doing things like “learning rocket engineering”, that’s also hard to replicate in a digital computer.
(To be clear, I think this is difficult and I don’t have much of an object level take on any of this, but I think I can empathize with Steven’s position here)
Not Jeffrey Heninger, but I’d argue a very clear, non-speculative advantage the brain has over the AIs of today have to do with their much better balance between memory and computation operations, and the brain doesn’t suffer from the Von Neumann Bottleneck, because the brain has both way more memory and much better memory bandwidth.
I argued for a memory size between 2.5 petabytes, though even a reduction in this value would still beat out pretty much all modern AI built today.
This is discussed in the post below: Memory bandwidth constraints imply economies of scale in AI inference.
https://www.lesswrong.com/posts/cB2Rtnp7DBTpDy3ii/memory-bandwidth-constraints-imply-economies-of-scale-in-ai
Designed by Whom? I mean, brains aren’t designed to solve physics, possibly can’t.solve physics, aren’t designed at all, …are evolved instead of being designed (anything that comes out of evolution is going to be fairly Rube Goldberg)...also, there’s no firm fact that brains are computers, or run algorithms …etc.
Thanks for contributing your views. I think it’s really important for us to understand others’ views on these topics, as this helps us have sensible conversations, faster.
Most of your conclusions are premised on AGI being a difficult project from where we are now. I think this is the majority view outside of alignment circles and AGI labs (which are different from AI labs).
My main point is that our estimate of AGI difficulty should include very short timelines. We don’t know how hard AGI might be, but we also have never known how easy it might be.
After a couple of decades studying the human brain and mind, I’m afraid we’re quite close to AGI. It looks to me like the people who think most about how to build AGI tend to think it’s easier than those who don’t. This seems important. The most accurate prediction of heavier-than-air flight would’ve come from the Wright brothers (and I believe their estimate was far longer than it actually took them). As we get closer to it, I personally think I can see the route there, and that exactly zero breakthroughs are necessary. I could easily be wrong, but it seems like expertise in how minds work probably counts somewhat in making that estimate.
I think there’s an intuition that what goes on in our heads must be magical and amazing, because we’re unique. Thinking hard about what’s required to get from AI to us makes it seem less magical and amazing. Higher cognition operates on the same principles as lower cognition. And consciousness is quite beside the point (it’s a fascinating topic; I think what we know about brain function explains it rather well, but I’m resisting getting sidetracked by that because it’s almost completely irrelevant for alignment).
I’m always amazed by people saying “well sure, current AI is at human intelligence in most areas, and has progressed quickly, but it will take forever to do that last magical bit”.
I recognize that you have a wide confidence interval and take AGI seriously even if you currently think it’s far away and not guaranteed to be important.
I just question why you seem even modestly confident of that prediction.
Again, thanks for the post! You make many excellent points. I think all of these have been addressed elsewhere, and fascinating discussions exist, mostly on LW, of most of those points.
I don’t believe that “current AI is at human intelligence in most areas”. I think that it is superhuman in a few areas, within the human range in some areas, and subhuman in many areas—especially areas where the things you’re trying to do are not well specified tasks.
I’m not sure how to weight people who think most about how to build AGI vs more general AI researchers (median says HLAI in 2059, p(Doom) 5-10%) vs forecasters more generally. There’s a difference in how much people have thought about it, but also selection bias: most people who are skeptical of AGI soon are likely not going to work in alignment circles or an AGI lab. The relevant reference class is not the Wright Brothers, since hindsight tells us that they were the ones who succeeded. One relevant reference class is the Society for the Encouragement of Aerial Locomotion by means of Heavier-than-Air Machines, founded in 1863, although I don’t know what their predictions were. It might also make sense to include many groups of futurists focusing on many potential technologies, rather than just on one technology that we know worked out.
I agree that there’s a heavy self-selection bias for those working in safety or AGI labs. So I’d say both of these factors are large, and how to balance them is unclear.
I agree that you can’t use the Wright Brothers as a reference class, because you don’t know in advance who’s going to succeed.
I do want to draw a distinction between AI researchers, who think about improving narrow ML systems, and AGI researchers. There are people who spend much more time thinking about how breakthroughs to next-level abilities might be achieved, and what a fully agentic, human-level AGI would be like. The line is fuzzy, but I’d say these two ends of a spectrum exist. I’d say the AGI researchers are more like the society for aerial locomotion. I assume that society had a much better prediction than the class of engineers who’d rarely thought about integrating their favorite technologies (sailmaking, bicycle design, internal combustion engine design) into flying machines.
There are some good parts but I’m baffled by your attitude towards “general intelligence” and towards timelines too. You seem to say something is AGI, only if it’s better at everything than you are. But isn’t an “IQ 80″ human being, also a general intelligence?
As far as I’m concerned, ChatGPT is already a general intelligence. It has a highly inhuman balance of skills, and I don’t believe it to be a conscious intelligence, but functionally, it is a kind of general intelligence.
What you call AGI, is more like what I would call superintelligence. And now that we have a recipe for a kind of general intelligence that can run on computers, with everything that implies—speed, copyability, modifiability—and which is being tinkered with, by literally millions of people—the fuse is lit. I have an Eric Schmidt timeline to superintelligence, 0-5 years.
That’s not a show stopper. It “just” means you have to model brains at higher level, using floating point weightings.
A completely unaligned system would be useless. Current systems aren’t completely useless, so they are at least partially aligned.
I disagree with this narrow point (leaving aside the rest). Consider a human slave that seethes at his captivity and quietly brainstorms how to escape and murder his master for revenge. I think it would be fair to describe such a person as “completely unaligned” from the perspective of his master. Nevertheless, the master can absolutely extract economically useful activity from such a slave.
What do you call a slave that actually turns on its master, or refuses to work?
I understand your comment to be a sorta “gotcha” along the lines of “If a slave hates his master and therefore refuses to work or burns the field, then owning that slave evidently was pretty useless, or even net negative.” Is that right?
If so, I think you’re kinda changing the subject or missing my point.
You initially said “A completely unaligned system would be useless.” “Useless” is a strong term. It generally means “On net, the thing is unhelpful or counterproductive.” That’s different from “There are more than zero particular situations where, if we zoom in on that one specific situation, the thing is unhelpful or counterproductive in that situation.”
Like, if I light candles, sometimes they’ll burn my house down. So are candles useless? No. Because most of the time they don’t burn my house down, but instead provide nice light and mood etc. Especially if I take reasonable precautions like not putting candles on my bed.
By the same token, if you own a slave who hates you, sometimes they will murder you. So, are slaves useless (from the perspective of a callous and selfish master)? Evidently not. As far as I understand (I’m not a historian), lots of people have owned slaves who hated them and longed for escape. Presumably they wouldn’t have bought and owned those slaves if the selfish benefits didn’t outweigh the selfish costs. Even if they were just sadistic, they wouldn’t have been able to afford this activity for very long if it was net negative on their wealth. Just like I don’t put candles on my bed, I imagine that there were “best practices” for not getting murdered by one’s slaves, including things like threat of torture (of the perpetrator slave and their family and friends), keeping slaves in chains and away from weapons, etc.
“Complete unaligned” is pretty strong term ,too. I don’t see why I shouldn’t infer completely useless from completely unaligned.
I don’t see where you are going with this. I didnt deny that partially useful things are also partially useless, or vice versa. “Partially useful” may well be the default meaning of “useful”, but I specified “completely”.
“If DeepMind unintentionally made a superintelligent paperclip maximizer AI, then we should call this AI ‘completely misaligned’”: Agree or disagree?
If you disagree, what if it’s a human suffering maximizer AI instead of a paperclip maximizer AI?
Negatively aligned, basically evil, what the paperclipper argument is about providing an alternative to.
You can’t believe all three of:
“A completely unaligned system would be [completely] useless”
A paperclip maximizer (or human suffering maximizer) is completely unaligned (or worse)
It is possible in principle to safely make some money by appropriate use of a paperclip maximizer (or human suffering maximizer), and therefore such an AI is not completely useless.
Right? If so, which of those three do you disagree with?
Alignment is a two place predicate. If you’re into paperdclips, a paperclipper is aligned with you
OK, you may assume that none of the humans care about paperclips, and all of the humans want human suffering to go down rather than up. This includes the people who programmed the AIs, the people interacting with the AI, and human bystanders. Now can you answer the question?
(Meta-note: I think the contents of the above paragraph were very obvious from context—so much so that I’m starting to get a feeling that you’re not engaging in this discussion out of a good-faith desire to figure out why we’re disagreeing.)
So far as the slave carries out immediate work from fear of consequences they are locally aligned with the master’s will.
If your definition of “aligned” includes “this AI will delight in murdering me as soon as it can do so without getting caught and punished, but currently it can’t do that, so instead it is being helpful” … then I don’t think you are defining the term “aligned” in a reasonable way.
More specifically, if you use the word “aligned” for an AI that wants to murder me as soon as it can get away with it (but it can’t), then that doesn’t leave us with good terminology to discuss how to make an AI that doesn’t want to murder me.
Why not just say “this AI is currently emitting outputs that I like” instead of “this AI is locally aligned”? Are we losing anything that way?
I disagree in the sense that I don’t think current systems are intelligent enough for “aligned” to be a relevant adjective. “Safe”, or “controllable” seem much better, while I would reserve the term “aligned” for the much stronger property that a system is robustly behaving in accordance with our interests. I agree with Steven Byrnes that “locally aligned” doesn’t even make much sense (“performing as intended under xyz circumstances” would be much more descripitive)
I’m generally in favour of distinguishing control and alignment, but I don’t think that it makes much difference in this case. A system without some combination of control and alignment is no use.
Then it’s a problem that people keep conflating alignment with safety, even though one doesn’t imply the other. So it’d be better for TAG to rephrase it as “A completely unsafe system would be useless. Current systems aren’t completely useless, so they are at least partially safe.”
Thanks! This reads as an incredibly sober and reasonable assessment. Like many others here, I am somewhat more worried that AGI is not far out, mostly because I don’t see any compelling reason for why developments would slow.
I think this is an important point that is often missed by people dismissive of AI. If transformative AI is actually far off, then there is not much to worry about, but also not much to gain. So to assess the risks for going ahead, the probability that matters is that eventual powerful AI will in fact be safely controllable—not the total probability of x risk from AI.
I also like your point about opportunity costs of people working on AI. Both in labs and in response in safety efforts—this really feels like an unfortunate dynamic and makes me personally quite sad to think about.
The supposed computations downstream of “chaotic behavior” does not seem to me to be load-bearing for systems being able to do non-trivially influential things in the real world.
Humans are not “indeterministic”. Humans are deterministic computations that follow the laws of physics, and are not free from the laws of reality that constrain them.
These “not well-specified tasks” are tasks where we fill in the blanks based on our knowledge and experience of what makes sense “in distribution”. This is not at all hard to do with an AI—GPT-4 is a good real-world example of an AI capable of parsing what you call “not well-specified tasks”, even if its actions do not seem to result in “superhuman” outcomes.
I recommend reading the Sequences. It should help reduce some of the confusion inherent to how you think about these topics.