Common misconceptions about OpenAI
I have recently encountered a number of people with misconceptions about OpenAI. Some common impressions are accurate, and others are not. This post is intended to provide clarification on some of these points, to help people know what to expect from the organization and to figure out how to engage with it. It is not intended as a full explanation or evaluation of OpenAI’s strategy.
The post has three sections:
Common accurate impressions
Common misconceptions
Personal opinions
The bolded claims in the first two sections are intended to be uncontroversial, i.e., most informed people would agree with how they are labeled (correct versus incorrect). I am less sure about how commonly believed they are. The bolded claims in the last section I think are probably true, but they are more open to interpretation and I expect others to disagree with them.
Note: I am an employee of OpenAI. Sam Altman (CEO of OpenAI) and Mira Murati (CTO of OpenAI) reviewed a draft of this post, and I am also grateful to Steven Adler, Steve Dowling, Benjamin Hilton, Shantanu Jain, Daniel Kokotajlo, Jan Leike, Ryan Lowe, Holly Mandel and Cullen O’Keefe for feedback. I chose to write this post and the views expressed in it are my own.
Common accurate impressions
Correct: OpenAI is trying to directly build safe AGI.
OpenAI’s Charter states: “We will attempt to directly build safe and beneficial AGI, but will also consider our mission fulfilled if our work aids others to achieve this outcome.” OpenAI leadership describes trying to directly build safe AGI as the best way to currently pursue OpenAI’s mission, and have expressed concern about scenarios in which a bad actor is first to build AGI, and chooses to misuse it.
Correct: the majority of researchers at OpenAI are working on capabilities.
Researchers on different teams often work together, but it is still reasonable to loosely categorize OpenAI’s researchers (around half the organization) at the time of writing as approximately:
Capabilities research: 100
Alignment research: 30
Policy research: 15
Correct: the majority of OpenAI employees did not join with the primary motivation of reducing existential risk from AI specifically.
My strong impressions, which are not based on survey data, are as follows. Across the company as a whole, a minority of employees would cite reducing existential risk from AI as their top reason for joining. A significantly larger number would cite reducing risk of some kind, or other principles of beneficence put forward in the OpenAI Charter, as their top reason for joining. Among people who joined to work in a safety-focused role, a larger proportion of people would cite reducing existential risk from AI as a substantial motivation for joining, compared to the company as a whole. Some employees have become motivated by existential risk reduction since joining OpenAI.
Correct: most interpretability research at OpenAI stopped after the Anthropic split.
Chris Olah led interpretability research at OpenAI before becoming a cofounder of Anthropic. Although several members of Chris’s former team still work at OpenAI, most of them are no longer working on interpretability.
Common misconceptions
Incorrect: OpenAI is not working on scalable alignment.
OpenAI has teams focused both on practical alignment (trying to make OpenAI’s deployed models as aligned as possible) and on scalable alignment (researching methods for aligning models that are beyond human supervision, which could potentially scale to AGI). These teams work closely with one another. Its recently-released alignment research includes self-critiquing models (AF discussion), InstructGPT, WebGPT (AF discussion) and book summarization (AF discussion). OpenAI’s approach to alignment research is described here, and includes as a long-term goal an alignment MVP (AF discussion).
Incorrect: most people who were working on alignment at OpenAI left for Anthropic.
The main group of people working on alignment (other than interpretability) at OpenAI at the time of the Anthropic split at the end of 2020 was the Reflection team, which has since been renamed to the Alignment team. Of the 7 members of the team at that time (who are listed on the summarization paper), 4 are still working at OpenAI, and none are working at Anthropic. Edited to add: this fact alone is not intended to provide a complete picture of the Anthropic split, which is more complicated than I am able to explain here.
Incorrect: OpenAI is a purely for-profit organization.
OpenAI has a hybrid structure in which the highest authority is the board of directors of a non-profit entity. The members of the board of directors are listed here. In legal paperwork signed by all investors, it is emphasized that: “The [OpenAI] Partnership exists to advance OpenAI Inc [the non-profit entity]‘s mission of ensuring that safe artificial general intelligence is developed and benefits all of humanity. The General Partner [OpenAI Inc]’s duty to this mission and the principles advanced in the OpenAI Inc Charter take precedence over any obligation to generate a profit. The Partnership may never make a profit, and the General Partner is under no obligation to do so.”
Incorrect: OpenAI is not aware of the risks of race dynamics.
OpenAI’s Charter contains the following merge-and-assist clause: “We are concerned about late-stage AGI development becoming a competitive race without time for adequate safety precautions. Therefore, if a value-aligned, safety-conscious project comes close to building AGI before we do, we commit to stop competing with and start assisting this project. We will work out specifics in case-by-case agreements, but a typical triggering condition might be “a better-than-even chance of success in the next two years.””
Incorrect: OpenAI leadership is dismissive of existential risk from AI.
OpenAI has a Governance team (within Policy Research) that advises leadership and is focused on strategy for avoiding existential risk from AI. In multiple recent all-hands meetings, OpenAI leadership have emphasized to employees the need to scale up safety efforts over time, and encouraged employees to familiarize themselves with alignment ideas. OpenAI’s Chief Scientist, Ilya Sutskever, recently pivoted to spending 50% of his time on safety.
Personal opinions
Opinion: OpenAI leadership cares about reducing existential risk from AI.
I think that OpenAI leadership are familiar and agree with the basic case for concern and appreciate the magnitude of what’s at stake. Existential risk is an important factor, but not the only factor, in OpenAI leadership’s decision making. OpenAI’s alignment work is much more than just a token effort.
Opinion: capabilities researchers at OpenAI have varying attitudes to existential risk.
I think that capabilities researchers at OpenAI have a wide variety of views, including some with long timelines who are skeptical of attempts to mitigate risk now, and others who are supportive but may consider the question to be outside their area of expertise. Some capabilities researchers actively look for ways to help with alignment, or to learn more about it.
Opinion: disagreements about OpenAI’s strategy are substantially empirical.
I think that some of the main reasons why people in the alignment community might disagree with OpenAI’s strategy are largely disagreements about empirical facts. In particular, compared to people in the alignment community, OpenAI leadership tend to put more likelihood on slow takeoff, are more optimistic about the possibility of solving alignment, especially via empirical methods that rely on capabilities, and are more concerned about bad actors developing and misusing AGI. I would expect OpenAI leadership to change their mind on these questions given clear enough evidence to the contrary.
Opinion: I am personally extremely uncertain about strategy-related questions.
I do not spend most of my time thinking about strategy. If I were forced to choose between OpenAI speeding up or slowing down its work on capabilities, my guess is that I would end up choosing the latter, all else equal, but I am very unsure.
Opinion: OpenAI’s actions have drawn a lot of attention to large language models.
I think that the release of GPT-3 and the OpenAI API led to significantly increased focus and somewhat of a competitive spirit around large language models. I consider there to be advantages and disadvantages to this. I don’t think OpenAI predicted this in advance, and believe that it would have been challenging, but not impossible, to foresee this.
Opinion: OpenAI is deploying models in order to generate revenue, but also to learn about safety.
I think that OpenAI is trying to generate revenue through deployment in order to directly create value and in order to fund further research and development. At the same time, it also uses deployment as a way to learn in various ways, and about safety in particular.
Opinion: OpenAI’s particular research directions are driven in large part by researchers.
I think that OpenAI leadership has control over staffing and resources that affects the organization’s overall direction, but that particular research directions are largely delegated to researchers, because they have the most relevant context. OpenAI would not be able to do impactful alignment research without researchers who have a strong understanding of the field. If there were talented enough researchers who wanted to lead new alignment efforts at OpenAI, I would expect them to be enthusiastically welcomed by OpenAI leadership.
Opinion: OpenAI should be focusing more on alignment.
I think that OpenAI’s alignment research in general, and its scalable alignment research in particular, has significantly higher average social returns than its capabilities research on the margin.
Opinion: OpenAI is a great place to work to reduce existential risk from AI.
I think that the Alignment, RL, Human Data, Policy Research, Security, Applied Safety, and Trust and Safety teams are all doing work that seems useful for reducing existential risk from AI.
- (My understanding of) What Everyone in Technical Alignment is Doing and Why by 29 Aug 2022 1:23 UTC; 413 points) (
- Apologizing is a Core Rationalist Skill by 2 Jan 2024 17:47 UTC; 153 points) (
- EA & LW Forums Weekly Summary (21 Aug − 27 Aug 22’) by 30 Aug 2022 1:37 UTC; 144 points) (EA Forum;
- Integrity in AI Governance and Advocacy by 3 Nov 2023 19:52 UTC; 134 points) (
- Thoughts on AGI organizations and capabilities work by 7 Dec 2022 19:46 UTC; 102 points) (
- Thoughts on AGI organizations and capabilities work by 7 Dec 2022 19:46 UTC; 77 points) (EA Forum;
- “Epistemic range of motion” and LessWrong moderation by 27 Nov 2023 21:58 UTC; 65 points) (
- EA & LW Forums Weekly Summary (21 Aug − 27 Aug 22′) by 30 Aug 2022 1:42 UTC; 57 points) (
- My thoughts on OpenAI’s alignment plan by 30 Dec 2022 19:33 UTC; 55 points) (
- 2022 (and All Time) Posts by Pingback Count by 16 Dec 2023 21:17 UTC; 53 points) (
- What Should AI Owe To Us? Accountable and Aligned AI Systems via Contractualist AI Alignment by 8 Sep 2022 15:04 UTC; 26 points) (
- Next steps after AGISF at UMich by 25 Jan 2023 20:57 UTC; 18 points) (EA Forum;
- My thoughts on OpenAI’s alignment plan by 30 Dec 2022 19:34 UTC; 16 points) (EA Forum;
- Monthly Overload of EA—September 2022 by 1 Sep 2022 13:43 UTC; 15 points) (EA Forum;
- Next steps after AGISF at UMich by 25 Jan 2023 20:57 UTC; 10 points) (
- 10 Dec 2023 0:17 UTC; 2 points) 's comment on OpenAI: The Battle of the Board by (
- 12 May 2024 11:28 UTC; 1 point) 's comment on LW Frontpage Experiments! (aka “Take the wheel, Shoggoth!”) by (
Since this post was written, OpenAI has done much more to communicate its overall approach to safety, making this post somewhat obsolete. At the time, I think it conveyed some useful information, although it was perceived as more defensive than I intended.
My main regret is bringing up the Anthropic split, since I was not able to do justice to the topic. I was trying to communicate that OpenAI maintained its alignment research capacity, but should have made that point without mentioning Anthropic.
Ultimately I think the post was mostly useful for sparking some interesting discussion in the comments.