[The original question was “Is OpenAI increasing the existential risks related to AI?” I changed it to the current one following a discussion with Rohin in the comments. It clarifies that my question asks about the consequences of OpenAI’s work will assuming positive and aligned intentions.]
This is a question I’ve been asked recently by friends interested in AI Safety and EA. Usually this question comes from discussions around GPT-3 and the tendency of OpenAI to invest a lot in capabilities research.
[Following this answer by Vaniver, I propose for a baseline/counterfactual the world where OpenAI doesn’t exists but the researchers there still do.]
Yet I haven’t seen it discussed here. Is it a debate we failed to have, or has there already been some discussion around it? I found a post from 3 years ago, but I think the situation probably changed in the meantime.
A couple of arguments for and against to prompt your thinking:
OpenAI is increasing the existential risks related to AI because:
They are doing far more capability research than safety research;
They are pushing the state of the art of capability research;
Their results will motivate many people to go work on AI capabilities, whether out of wonder or out of fear of unemployment.
OpenAI is not increasing the existential risks related to AI because:
They have a top-notch safety team;
They restrict the access to their models, by either not releasing them outright (GPT-2) or bottlenecking access through their API (GPT-3);
Their results are showing the potential dangers of AI, and pushing many people to go work on AI safety.Is OpenAI increasing the existential risks related to AI?
[Speaking solely for myself in this comment; I know some people at OpenAI, but don’t have much in the way of special info. I also previously worked at MIRI, but am not currently.]
I think “increasing” requires some baseline, and I don’t think it’s obvious what baseline to pick here.
For example, consider instead the question “is MIRI decreasing the existential risks related to AI?”. Well, are we comparing to the world where everyone currently employed at MIRI vanishes? Or are we comparing to the world where MIRI as an organization implodes, but the employees are still around, and find jobs somewhere else? Or are we comparing to the world where MIRI as an organization gets absorbed by some other entity? Or are we comparing to the world where MIRI still exists, the same employees still work there, but the mission is somehow changed to be the null mission?
Or perhaps we’re interested in the effects on the margins—if MIRI had more dollars to spend, or less dollars, how would the existential risks change? Even the answers to those last two questions could easily be quite different—perhaps firing any current MIRI employee would make things worse, but there are no additional people that could be hired by MIRI to make things better. [Prove me wrong!]
---
With that preamble out of the way, I think there are three main obstacles to discussing this in public, a la Benquo’s earlier post.
The main one is something like “appeals to consequences.” Talking in public has two main functions: coordinating and information-processing, and it’s quite difficult to separate the two functions. [See this post and the related posts at the bottom.] Suppose I think OpenAI makes humanity less safe, and I want humanity to be more safe; I might try to figure out which strategy will be most persuasive (while still correcting me if I’m the mistaken one!) and pursue that strategy, instead of employing a strategy that more quickly ‘settles the question’ at the cost of making it harder to shift OpenAI’s beliefs. More generally, the people with the most information will be people closest to OpenAI, which probably makes them more careful about what they will or won’t say. There also seem to be significant asymmetries here, as it might be very easy to say “here are three OpenAI researchers I think are making existential risk lower” but very difficult to say “here are three OpenAI researchers I think are making existential risk higher.” [Setting aside the social costs, there’s their personal safety to consider.]
The second one is something like “prediction is hard.” One of my favorite math stories is the history of the Markov chain; in the version I heard, Markov’s rival said a thing, Markov thought to himself “that’s not true!” and then formalized the counterexample in a way that dramatically improved that field. Supposing Benquo’s story of how OpenAI came about is true, and OpenAI will succeed at making beneficial AI, and (counterfactually) DeepMind wouldn’t have succeeded. In this hypothetical world, then it would be the case that while the direct effect of DeepMind on existential AI risk would have been negative, the indirect effect would be positive (as otherwise OpenAI, which succeeded, wouldn’t have existed). While we often think we have a good sense of the direct effect of things, in complicated systems it becomes very non-obvious what the total effects are.
The third one is something like “heterogeneity.” Rather than passing a judgment on the org as a whole, it would make more sense to make my judgments more narrow; “widespread access to AI seems like it makes things worse instead of better,” for example, which OpenAI seems to already have shifted their views on, instead focusing on widespread benefits instead of widespread access.
---
With those obstacles out of the way, here’s some limited thoughts:
I think OpenAI has changed for the better in several important ways over time; for example, the ‘Open’ part of the name is not really appropriate anymore, but this seems good instead of bad on my models of how to avoid existential risks from AI. I think their fraction of technical staff devoted to reasoning about and mitigating risks is higher than DeepMind’s, although lower than MIRI’s (tho MIRI’s fraction is a very high bar); I don’t have a good sense whether that fraction is high enough.
I think the main effects of OpenAI are the impacts they have on the people they hire (and the impacts they don’t have on the people they don’t hire). There are three main effects to consider here: resources, direction-shifting, and osmosis.
On resources, imagine that there’s Dr. Light, whose research interests point in a positive direction, and Dr. Wily, whose research interests point in a negative direction, and the more money you give to Dr. Light the better things get, and the more money you give to Dr. Wily, the worse things get. [But actually what we care about is counterfactuals; if you don’t give Dr. Wily access to any of your compute, he might go elsewhere and get similar amounts of compute, or possibly even more.]
On direction-shifting, imagine someone has a good idea for how to make machine learning better, and they don’t really care what the underlying problem is. You might be able to dramatically change their impact by pointing them at cancer-detection instead of missile guidance, for example. Similarly, they might have a default preference for releasing models, but not actually care much if management says the release should be delayed.
On osmosis, imagine there are lots of machine learning researchers who are mostly focused on technical problems, and mostly get their ‘political’ opinions for social reasons instead of philosophical reasons. Then the main determinant of whether they think that, say, the benefits of AI should be dispersed or concentrated might be whether they hang out at lunch with people who think the former or the latter.
I don’t have a great sense of how those factors aggregate into an overall sense of “OpenAI: increasing or decreasing risks?”, but I think people who take safety seriously should consider working at OpenAI, especially on teams clearly related to decreasing existential risks. [I think people who don’t take safety seriously should consider taking safety seriously.]
I would reemphasize that the “does OpenAI increase risks” is a counterfactual question. That means we need to be clearer about what we are asking as a matter of predicting what the counterfactuals are, and consider strategy options for going forward. This is a major set of questions, and increasing or decreasing risks as a single metric isn’t enough to capture much of interest.
For a taste of what we’d want to consider, what about the following:
Are we asking OpenAI to pick a different, “safer” strategy?
Perhaps they should focus more on hiring people to work on safety and strategy, and hire fewer capabilities researchers. That brings us to the Dr. Wily/Dr. Light question—Perhaps Dr. Capabilities B. Wily shouldn’t be hired, and Dr. Safety R. Light should be, instead. That means Wily does capabilities research elsewhere, perhaps with more resources, and Light does safety research at OpenAI. But the counterfactual is that Light would do (perhaps slightly less well funded) research on safety anyways, and Wily would work on (approximately as useful) capabilities research at OpenAI—advantaging OpenAI in any capabilities races in the future.
Are we asking OpenAI to be larger, and (if needed,) we should find them funding?
Perhaps the should hire both, along with all of Dr. Light and Dr. Wily’s research teams. Fast growth will dilute OpenAI’s culture, but give them an additional marginal advantage over other groups. Perhaps bringing them in would help OpenAI in race dynamics, but make it more likely that they’d engage in such races.
How much funding would this need? Perhaps none—they have cash, they just need to do this. Or perhaps tons, and we need them to be profitable, and focus on that strategy, with all of the implications of that. Or perhaps a moderate amount, and we just need OpenPhil to give them another billion dollars, and then we need to ask about the counterfactual impact of that money.
Or OpenAI should focus on redirecting their capabilities staff to work on safety, and have a harder time hiring the best people who want to work on capabilities? Or OpenAI should be smaller and more focused, and reserve cash?
These are all important questions, but need much more time than I, or I suspect, most of the readers here have available—and are probably already being discussed more usefully by both OpenAI, and their advisors.
Also apparently Megaman is less popular than I thought so I added links to the names.
Oh. Right. I should have gotten the reference, but wasn’t thinking about it.
Fwiw I recently listened to the excellent song ‘The Good Doctor’ which has me quite delighted to get random megaman references.
Just so you know, I got the reference. ;)
Thanks a lot for this great answer!
First, I should have written it, but my baseline (or my counterfactual) is a world where OpenAI doesn’t exists but the people working there still exists. This might be an improvement if you think that pushing the scaling hypothesis is dangerous and that most of the safety team would find money to keep working, or an issue if you think someone else, probably less aligned, would have pushed the scaling hypothesis, and that the structure given by OpenAI to its safety team is really special and important.
As for your obstacle, I agree that they pose problem. It’s the reason why I don’t expect a full answer to this question. On the other hand, as you show yourself with the end of your post, I still believe we can have a fruitful discussion and debate on some of the issues. This might result in a different stance toward OpenAI, or arguments to defending it, or something completely different. But I don’t think there is nothing to be gained by having this discussion.
This aims at one criticism of OpenAI I often see: the amount of resources they give to capability research. Your other arguments (particularly osmosis) might influence this, but there’s an intuitive reason for which you might want to only given resources to the Dr Light out there.
On the other hand, your counterfactual world hints that maybe redirecting Dr. Wily or putting it in an environment where the issues of safety are mentioned a lot might help stirs his research in a positive direction.
Here too, I see a way for this part to mean that OpenAI has a positive impact and to mean that it has a negative impact. On the positive impact side, the constraints on models released, the fact of even having a safety team and discussing safety might push new researchers to go into safety or to consider more safety related issues in their work. But on the negative impact side, GPT-3 (as an example) is really cool. If you’re a young student, you might be convinced by it to go work on AI capabilities, without much thought about safety.
This is probably the most clearly positive point for OpenAI. Still, I’m curious of how much safety plays a role in the culture of OpenAI. For example, are all researchers and engineers sensibilized to safety issues? If that’s the case, then the culture would seem to lessen the risks significantly.
But part of the problem here is that the question “what’s the impact of our stance on OpenAI on existential risks?” is potentially very different from “is OpenAI’s current direction increasing or decreasing existential risks?”, and as people outside of OpenAI have much more control over their stance than they do over OpenAI’s current direction, the first question is much more actionable. And so we run into the standard question substitution problems, where we might be pretending to talk about a probabilistic assessment of an org’s impact while actually targeting the question of “how do I think people should relate to OpenAI?”.
[That said, I see the desire to have clear discussion of the current direction, and that’s why I wrote as much as I did, but I think it has prerequisites that aren’t quite achieved yet.]
Post OpenAI exodus update: does the exit of Dario Amodei, Chris Olah, Jack Clarke and potentially others from OpenAI make you change your opinion?
I think it’s fairly self-evident that you should have exceedingly high standards for projects intending to build AGI (OpenAI, DeepMind, others). It’s really hard to reduce existential risk from AI, and I think much thought around this has been naive and misguided.
(Two examples of this outside of OpenAI include: senior AI researchers talking about military use of AI instead of misalignment, and senior AI researchers saying responding to the problems of specification gaming by saying “objectives can be changed quickly when issues surface” and “existential threats to humanity have to be explicitly designed as such”.)
An obvious reason to think OpenAI’s impact will be net negative is that they seem to be trying to reach AGI as fast as possible, and trying a route different from DeepMind and other competitors, so are in some world shortening the timeline until AI. (I’m aware that there are arguments about why a shorter timeline is better, but I’m not sold on them right now.)
There are also more detailed conversations, about alignment, what the core of the problem actually is, and other strategic questions. I expect (and take from occasional things I hear) I have substantial disagreements with OpenAI decision-makers, which I think alone is sufficient reason for me to feel doomy about humanity’s prospects.
That said, I’m quite impressed with their actions around release practises and also their work in becoming a profit-capped entity. I felt like they were a live player with these acts and were clearly acting against their short-term self-interest in favour of humanity’s broader good, with some relatively sane models around these specific aspects of what’s important. Those were both substantial updates for me, and make me feel pretty cooperative with them.
And of course I’m very happy indeed about a bunch of the safety work they do and support. The org give lots of support and engineers to people like Paul Christiano, Chris Olah, etc that I think is better than those people probably would get counterfactually, and I’m very grateful that the organisation provides this.
Overall I don’t feel my opinion is very robust, and could easily change. Here’s some example of things that I think could substantially change my opinion:
How senior decision-making happens at OpenAI
What technical models of AGI senior researchers at OpenAI have
Broader trends that would have happened to the field of AI (and the field of AI alignment) in the counterfactual world where they were not founded
Thanks for your answer! Trying to make your examples of what might change your opinion substantially more concrete, I got these:
Does senior decision-making at OpenAI always consider safety issues before greenlighting new capability research?
Do senior researchers at OpenAI believe that their current research directly leads to AGI in the short term?
Would the Scaling Hypothesis (and thus GPT-N) have been vindicated as soon in a world without OpenAI?
Do you agree with these? Do you have other ideas of concrete questions?
The first one feels a bit too optimistic. It’s something more like: Are they able to be direct in their disagreement with one another? What level of internal politicking is there? How much ability do some of the leadership have to make unilateral decisions? Etc.
The second one is the one more about alignment, takeoff dynamics, and timelines. All the details, like the likelihood of Mesa optimisers. What are their thoughts on this, and how much do they think about it?
For the third, that one’s good. Also things about how differently things would’ve gone at DeepMind, and also how good/bad the world would be if Musk hadn’t shifted The Overton window so much (which I think is counterfactually linked up with OpenAI existing, you get both or neither).
Post OpenAI exodus update: does the exit of Dario Amodei, Chris Olah, Jack Clarke and potentially others from OpenAI make you change your opinion?
See all the discussion under the OpenAI tag. Don’t forget SSC’s post on it either.
I mostly think we had a good discussion about it when it launched (primarily due to Ben Hoffman and Scott Alexander deliberately creating the discussion).
Do you think you (or someone else) could summarize this discussion here? I have to admit that the ideas being spread out between multiple posts doesn’t help.
I don’t plan to.
I’d strong upvote if someone else did a nice job of summarising the discussion, perhaps inspired by how I distilled the discusssion around what failure looks like.
(To be clear I think my distillation of the comment section was much better and more useful than the distillation of the post itself.)
Putting aside the general question, is OpenAI good for the world, I want to consider the smaller question, how do OpenAI’s demonstrations of scaled up versions of current models affect AI safety?
I think there’s a much easier answer to this. Any risks we face from scaling up models we already have with funding much less than tens of billions of dollars amounts to unexploded uranium sitting around, that we’re refining in microgram quantities. The absolute worst that can happen with connectionist architectures is that we solve all the hard problems without having done the trivial scaled-up variants, and therefore scaling up is trivial, and so that final step to superhuman AI also becomes trivial.
Even if scaling up ahead of time results in slightly faster progress towards AGI, it seems that it at least makes it easier to see what’s coming, as incremental improvements require research and thought, not just trivial quantities of dollars.
Going back to the general question, one good I see OpenAI producing is the normalization of the conversation around AI safety. It is important for authority figures to be talking about long-term outcomes, and in order to be an authority figure, you need a shiny demo. It’s not obvious how a company could be more authoritative than OpenAI while being less novel.
Post OpenAI exodus update: does the exit of Dario Amodei, Chris Olah, Jack Clarke and potentially others from OpenAI make you change your opinion?
To the question, how do OpenAI’s demonstrations of scaled up versions of current models affect AI safety?, I don’t think much changes? It does seem that OpenAI is aiming to go beyond simple scaling, which seems much riskier.
As to the general question, certainly that news makes me more worried about the state of things. I know way too little about the decision to be more concrete than that.
OpenAI’s work speeds up progress, but in a way that’s likely smooth progress later on. If you spend as much compute as possible now, you reduce potential surprises in the future.
But what if they reach AGI during their speed up? The smoothing at a later time assumes that we’ll end up with diminishing returns before AGI, which is not what happens for the moment.
I agree, but I think it’s unlikely OpenAI will be the first to build AGI.
(Except maybe if it turns out AGI isn’t economically viable).
If OpenAI changed direction tomorrow, how long would that slow the progress to larger models? I can’t see it lasting; the field of AI is already incessantly moving towards scale, and big models are better. Even in a counterfactual where OpenAI never started scaling models, is this really something that no other company can gradient descent on? Models were getting bigger without OpenAI, and the hardware to do it at scale is getting cheaper.
Well, if we take this comment by gwern at face value, it clearly seems that no one with the actual resources has any interest in doing it for now. Based on these premises, scaling towards incredibly larger models would probably not have happened for years.
So I do think that if you believe this is wrong, you should be able to show where gwern’s comment is wrong.
Gwern’s claim is that these other institutions won’t scale up as a consequence of believing the scaling hypothesis; that is, they won’t bet on it as a path to AGI, and thus won’t spend this money on abstract of philosophical grounds.
My point is that this only matters on short-term scales. None of these companies are blind to the obvious conclusion that bigger models are better. The difference between a hundred-trillion dollar payout and a hundred-million dollar payout is philosophical when you’re talking about justifying <$5m investments. NVIDIA trained an 8.3 B parameter model as practically an afterthought. I get the impression Microsoft’s 17 B parameter Turing-NLG was basically trained to test DeepSpeed. As markets open up to exploit the power of these larger models, the money spent on model scaling is going to continue to rise.
These companies aren’t competing with OpenAI. They’ve built these incredibly powerful systems incidentally, because it’s the obvious way to do better than everyone else. It’s a tool they use for market competitiveness, not as a fundamental insight into the nature of intelligence. OpenAI’s key differentiator is only that they view scale as integral and explanatory, rather than an incidental nuisance.
With this insight, OpenAI can make moonshots that the others can’t: build a huge model, scale it up, and throw money at it. Without this understanding, others will only get there piecewise, scaling up one paper at a time. The delta between the two is at best a handful of years.
The scaling hypothesis implies that it’ll happen eventually, yes: but the details matter a lot. One way to think of it is Eliezer’s quip: the IQ necessary to destroy the world drops by 1 point per year. Similarly, to do scaling or bitter-lesson-style research, you need
resources * fanaticism < a constant
. This constant seems to be very small, which is why compute had to drop all the way to ~$1k before any researchers worldwide were fanatical enough to bother trying CNNs and create AlexNet. Countless entities, and companies, could have used this ‘obvious way to do better than everyone else, for market competitiveness’ for years—or decades—before hand. But they didn’t.For the question of who gets there first, ‘a handful of years’ is decisive. So this is pretty important if you want to think about the current plausible AGI trajectories, which for many people (even excluding individuals like Moravec, or Shane Legg who has projected out to ~2028 for a long time now), have shrunk rapidly to timescales on which ‘a handful of years’ represents a large fraction of the outstanding timeline!
Incidentally, it has now been 86 days since the GPT-3 paper was uploaded, or a quarter of a year. Excluding GShard (which as a sparse model is not at all comparable parameter-wise), as far as I know no one has announced any new (dense) models which are even as large as Turing-NLG—much less larger than GPT-3.
A fairly minor point, but I don’t quite follow the formula / analogy. Don’t resources and fanaticism help you do the scaling research? So shouldn’t it be a > sign rather than <, and shouldn’t we say that the constant is large rather than small?
I agree this makes a large fractional change to some AI timelines, and has significant impacts on questions like ownership. But when considering very short timescales, while I can see OpenAI halting their work would change ownership, presumably to some worse steward, I don’t see the gap being large enough to materially affect alignment research. That is, it’s better OpenAI gets it in 2024 than someone else gets it in 2026.
It’s hard to be fanatical when you don’t have results. Nowadays AI is so successful it’s hard to imagine this being a significant impediment.
I wouldn’t dismiss GShard altogether. The parameter counts aren’t equal, but MoE(2048E, 60L) is still a beast, and it opens up room for more scaling than a standard model.
Post OpenAI exodus update: does the exit of Dario Amodei, Chris Olah, Jack Clarke and potentially others from OpenAI make you change your opinion?
No. Amodei led the GPT-3 project, he’s clearly not opposed to scaling things. Idk why they’re leaving but since they’re all starting a new thing together, I presume that’s the reason.
I think that OpenAI is certainly reducing massive misuse risks of AI. By existing, they have made it a significant chance that a capped-profit entity will be the first to develop transformative AI. Without them, it’s much more likely that a 100% for-profit company would be the first, and a for-profit company is more likely to misuse the power of being the first to have this power than a capped-profit entity.
As for misaligned AI, I’m not sure because while I think they’re unlikely to develop a super-powerful misaligned AI, as the OP says they are accelerating development of AI capabilities for everyone and spreading awareness of these.