I occasionally have some thoughts about why AGI might not be as near as a lot of people seem to think, but I’m confused about how/whether to talk about them in public.
The biggest reason for not talking about them is that one person’s “here is a list of capabilities that I think an AGI would need to have, that I don’t see there being progress on” is another person’s “here’s a roadmap of AGI capabilities that we should do focused research on”. Any articulation of missing capabilities that is clear enough to be convincing, seems also clear enough to get people thinking about how to achieve those capabilities.
At the same time, the community thinking that AGI is closer than it really is (if that’s indeed the case) has numerous costs, including at least:
Immense mental health costs to a huge number of people who think that AGI is imminent
People at large making bad strategic decisions that end up having major costs, e.g. not putting any money in savings because they expect it to not matter soon
Alignment people specifically making bad strategic decisions that end up having major costs, e.g. focusing on alignment approaches that one might pay off in the long term and neglecting more foundational long-term research
Alignment people losing credibility and getting a reputation of crying wolf once predicted AGI advances fail to materialize
Having a better model of what exactly is missing could conceivably also make it easier to predict when AGI will actually be near. But I’m not sure to what extent this is actually the case, since the development of core AGI competencies feels more of a question of insight than grind[1], and insight seems very hard to predict.
A benefit from this that does seem more plausible would be if the analysis of capabilities gave us information that we could use to figure out what a good future landscape would look like. For example, suppose that we aren’t likely to get AGI soon and that the capabilities we currently have will create a society that looks more like the one described in Comprehensive AI Services, and that such services could safely be used to detect signs of actually dangerous AGIs. If this was the case, then it would be important to know that we may want to accelerate the deployment of technologies that are taking in the world in a CAIS-like direction, and possibly e.g. promote rather than oppose things like open source LLMs.
One argument would be that if AGI really isn’t near, then that’s going to be obvious pretty soon, and it’s unlikely that my arguments in particular for this would be all that unique—someone else would be likely to make them soon anyway. But I think this argument cuts both ways—if someone else is likely to make the same arguments soon anyway, then there’s also limited benefit in writing them up. (Of course, if it saves people from significant mental anguish, even just making those arguments slightly earlier seems good, so overall this argument seems like it’s weakly in favor of writing up the arguments.)
- ^
From Armstrong & Sotala (2012):
Some AI prediction claim that AI will result from grind: i.e. lots of hard work and money. Other claim that AI will need special insights: new unexpected ideas that will blow the field wide open (Deutsch 2012).
In general, we are quite good at predicting grind. Project managers and various leaders are often quite good at estimating the length of projects (as long as they’re not directly involved in the project (Buehler, Griffin, and Ross 1994)). Even for relatively creative work, people have sufficient feedback to hazard reasonable guesses. Publication dates for video games, for instance, though often over-optimistic, are generally not ridiculously erroneous—even though video games involve a lot of creative design, play-testing, art, programing the game “AI,” etc. Moore’s law could be taken as an ultimate example of grind: we expect the global efforts of many engineers across many fields to average out to a rather predictable exponential growth.
Predicting insight, on the other hand, seems a much more daunting task. Take the Riemann hypothesis, a well-established mathematical hypothesis from 1885, (Riemann 1859). How would one go about estimating how long it would take to solve? How about the P = NP hypothesis in computing? Mathematicians seldom try and predict when major problems will be solved, because they recognize that insight is very hard to predict. And even if predictions could be attempted (the age of the Riemann’s hypothesis hints that it probably isn’t right on the cusp of being solved), they would need much larger error bars than grind predictions. If AI requires insights, we are also handicapped by the fact of not knowing what these insights are (unlike the Riemann hypothesis, where the hypothesis is clearly stated, and only the proof is missing). This could be mitigated somewhat if we assumed there were several different insights, each of which could separately lead to AI. But we would need good grounds to assume that.
Obviously I think it’s worth being careful, but I think in general it’s actually relatively hard to accidentally advance capabilities too much by working specifically on alignment. Some reasons:
Researchers of all fields tend to do this thing where they have really strong conviction in their direction and think everyone should work on their thing. Convincing them that some other direction is better is actually pretty hard even if you’re trying to shove your ideas down their throats.
Often the bottleneck is not that nobody realizes that something is a bottleneck, but rather that nobody knows how to fix it. In these cases, calling attention to the bottleneck doesn’t really speed things up, whereas for thinking about alignment we can reason about what things would look like if it were to be solved.
It’s generally harder to make progress on something by accident than to make progress on purpose on something if you try really hard to do it. I think this is true even if there is a lot of overlap. There’s also an EMH argument one could make here but I won’t spell it out.
I think the alignment community thinking correctly is essential for solving alignment. Especially because we will have very limited empirical evidence before AGI, and that evidence will not be obviously directly applicable without some associated abstract argument, any trustworthy alignment solution has to route through the community reasoning sanely.
Also to be clear I think the “advancing capabilities is actually good because it gives us more information on what AGI will look like” take is very bad and I am not defending it. The arguments I made above don’t apply, because they basically hinge on work on alignment not actually advancing capabilities.
Hasn’t the alignment community historically done a lot to fuel capabilities?
For example, here’s an excerpt from a post I read recently
I don’t think RLHF in particular had a very large counterfactual impact on commercialization or the arms race. The idea of non-RL instruction tuning for taking base models and making them more useful is very obvious for commercialization (there are multiple concurrent works to InstructGPT). PPO is better than just SFT or simpler approaches on top of SFT, but not groundbreakingly more so. You can compare text-davinci-002 (FeedME) and text-davinci-003 (PPO) to see.
The arms race was directly caused by ChatGPT, which took off quite unexpectedly not because of model quality due to RLHF, but because the UI was much more intuitive to users than the Playground (instruction following GPT3.5 was already in the API and didn’t take off in the same way). The tech tree from having a powerful base model to having a chatbot is not constrained on RLHF existing at all, either.
To be clear, I happen to also not be very optimistic about the alignment relevance of RLHF work beyond the first few papers—certainly if someone were to publish a paper today making RLHF twice as data efficient or whatever I would consider this basically just a capabilities paper.
I think empirically EA has done a bunch to speed up capabilities accidentally. And I think theoretically we’re at a point in history where simply sharing an idea can get it in the water supply faster than ever before.
A list of unsolved problems, if one of them is both true and underappreciated, can have a big impact.
The conversations I’ve had with people at Deepmind, OpenAI, and in academia make me very sure that lots of ideas on capabilities increases are already out there so there’s a high chance anything you suggest would be something people are already thinking about. Possibly running your ideas past someone in those circles, and sharing anything they think is unoriginal would be safe-ish?
I think one of the big bottlenecks is a lack of ways to predict how much different ideas would help without actually trying them at costly large scale. Unfortunately, this is also a barrier to good alignment work. I don’t have good ideas on making differential progress on this.
If {the reasoning for why AGI might not be near} comprises {a list of missing capabilities}, then my current guess is that the least-bad option would be to share that reasoning in private with a small number of relevant (and sufficiently trustworthy) people[1].
(More generally, my priors strongly suggest keeping any pointers to AGI-enabling capabilities private.)
E.g. the most capable alignment researchers who seem (to you) to be making bad strategic decisions due to not having considered {the reasoning for why AGI might not be near}.
I think that sharing the reasoning in private with a small number of people might somewhat help with the “Alignment people specifically making bad strategic decisions that end up having major costs” cost, but not the others, and even then it would only help a small amount of the people working in alignment rather than the field in general.
I mostly agree.
I also think that impact is very unevenly distributed over people; the most impactful 5% of people probably account for >70% of the impact. [1]
And if so, then the difference in positive impact between {informing the top 5%} and {broadcasting to the field in general on the open Internet} is probably not very large. [2]
Possibly also worth considering: Would (e.g.) writing a public post actually reach those few key people more effectively than (e.g.) sending a handful of direct/targeted emails? [3]
Talking about AI (alignment) here, but I think something like this applies in many fields. I don’t have a good quantification of “impact” in mind, though, so this is very hand-wavey.
Each approach has its downsides. The first approach requires identifying the relevant people, and is likely more effortful. The latter approach has the downside of putting potentially world-ending information in the hands of people who would use it to end the world (a bit sooner than they otherwise would).
What is in fact the most effective way to reach whoever needs to be reached? (I don’t know.)
From a broad policy perspective, it can be tricky to know what to communicate. I think it helps if we think a bit more about the effects of our communication and a bit less about correctly conveying our level of credence in particular claims. Let me explain.
If we communicate the simple idea that AGI is near then it pushes people to work on safety projects that would be good to work on even if AGI is not near while paying some costs in terms of reputation, mental health, and personal wealth.
If we communicate the simple idea that AGI is not near then people will feel less need to work on safety soon. This would let them not miss out on opportunities that would be good to take ahead of when they actually need to focus on AI safety.
We can only really communicate one thing at a time to people. Also, we should worry more about tail risks a false positives (thinking we can build AGI safely when we cannot) than false negatives (thinking we can’t build AGI safely when we can). Taking these two facts into consideration, I think the policy implication is clear: unless there is extremely strong evidence that AGI is not near, we must act and communicate as if AGI is near.
I reached this via Joachim pointing it out as an example of someone urging epistemic defection around AI alignment, and I have to agree with him there. I think the higher difficulty posed by communicating “we think there’s a substantial probability that AGI happens in the next 10 years” vs “AGI is near” is worth it even from a PR perspective, because pretending you know the day and the hour smells like bullshit to the most important people who need convincing that AI alignment is nontrivial.
I left a comment over in the other thread, but I think Joachim misunderstands my position.
In the above comment I’ve taken for granted that there’s a non-trivial possibility that AGI is near, so I’m not arguing we should say that “AGI is near” regardless of whether it is or not, because we don’t know if it is or not, we only have our guesses about it, and so long as there’s a non-trivial chance that AGI is near, I think that’s the more important message to communicate.
Overall it would be better if we can communicate something like “AGI is probably near”, but “probably” and similar terms are going to get rounded off, so even if you do literally say “AGI is probably near” or similar, that’s not what people will hear, and if you’re going to say “probably” my argument is that it’s better if they round the “probably” off to “near” rather than “not near”.
I agree with “When you say ‘there’s a good chance AGI is near’, the general public will hear ‘AGI is near’”.
However, the general public isn’t everyone, and the people who can distinguish between the two claims are the most important to reach (per capita, and possibly in sum).
So we’ll do better by saying what we actually believe, while taking into account that some audiences will round probabilities off (and seeking ways to be rounded closer to the truth while still communicating accurately to anyone who does understand probabilistic claims). The marginal gain by rounding ourselves off at the start isn’t worth the marginal loss by looking transparently overconfident to those who can tell the difference.
I’m replying only here because spreading discussion over multiple threads makes it harder to follow.
You left a reply on a question asking how to communicate about reasons why AGI might not be near. The question refers to costs of “the community” thinking that AI closer than it really is as a reason to communicate about reasons it might not be so close.
So I understood the question as asking about communication with the community (my guess: of people seriously working and thinking about AI-safety-as-in-AI-not-killing-everyone). Where it’s important to actually try to figure out truth.
You replied (as I understand) that when we communicate to general public we can transmit only 1 idea that so we should communicate that AGI is near (if we assign not-very-low probability to that).
I think the biggest problem I have with your posting “general public communication” as a reply to question asking about “community communication” pushes towards less clarity in the community, where I think clarity is important.
I’m also not sold on the “you can communicate only one idea” thing but I mostly don’t care to talk about it right now (it would be nice if someone else worked it out for me but now I don’t have capacity to do it myself).
Ah I see. I have to admit, I write a lot of my comments between things and I missed that the context of the post could cause my words to be interpreted this way. These days I’m often in executive mode rather than scholar mode and miss nuance if it’s not clearly highlighted, hence my misunderstanding, but also reflects where I’m coming from with this answer!
Reading Habryka’s recent discussion might give some inspiration.
Whatever the probability of AGI in the reasonably near future (5-10 years), the probability of societal shifts due to implementation of highly capable yet sub-AGI AI is strictly higher. I think regardless of where AI “lands” in terms of slowing down in progress (if it is the case we see an AI winter/fall), the application of systems that exist even just today, even if technological progress were to stop, is enough to merit appreciating the different world that is coming within the same order of magnitude as how different it would be with AGI.
I think it’s almost impossible at this point to argue against the value of providence with respect to the rise of dumb (in the relative to AGI sense) but highly highly capable AI.
I think it is okay for you to be vague. Simply saying that you can see numerous bottlenecks, but don’t wish to list them to avoid others working on them, is enough to cause some weaker update than a list would cause.