Non-Disparagement Canaries for OpenAI

Since at least 2017, OpenAI has asked departing employees to sign offboarding agreements which legally bind them to permanently—that is, for the rest of their lives—refrain from criticizing OpenAI, or from otherwise taking any actions which might damage its finances or reputation.^[1]

If they refused to sign, OpenAI threatened to take back (or make unsellable) all of their already-vested equity—a huge portion of their overall compensation, which often amounted to millions of dollars. Given this immense pressure, it seems likely that most employees signed.

If they did sign, they became personally liable forevermore for any financial or reputational harm they later caused. This liability was unbounded, so had the potential to be financially ruinous—if, say, they later wrote a blog post critical of OpenAI, they might in principle be found liable for damages far in excess of their net worth.

These extreme provisions allowed OpenAI to systematically silence criticism from its former employees, of which there are now hundreds working throughout the tech industry. And since the agreement also prevented signatories from even disclosing that they had signed this agreement, their silence was easy to misinterpret as evidence that they didn’t have notable criticisms to voice.

We were curious about who may have been silenced in this way, and where they work now, so we assembled an (incomplete) list of former OpenAI staff.^[2] From what we were able to find, it appears that over 500 people may have signed these agreements, of which only 5 have publicly reported being released so far.^[3]

We were especially alarmed to notice that the list contains a variety of former employees currently working on safety evaluations or AI policy.^[4]^[5] This includes some in leadership positions, for example:

Bilva Chandra (Senior AI Policy Advisor, NIST)
Charlotte Stix (Head of Governance, Apollo Research)
Jack Clark (Co-Founder [focused on policy and evals], Anthropic)
Jade Leung (CTO, UK AI Safety Institute)
Paul Christiano (Head of Safety, US AI Safety Institute)

In our view, it seems hard to trust that people could effectively evaluate or regulate AI, while under strict legal obligation to avoid sharing critical evaluations of a top AI lab, or from taking any other actions which might make the company less valuable (as many regulations presumably would). So if any of these people are not subject to these agreements, we encourage them to mention this in public.

It is rare for company offboarding agreements to contain provisions this extreme—especially those which prevent people from even disclosing that the agreement itself exists. But such provisions are relatively common in the American intelligence industry. The NSA periodically forces telecommunications providers to reveal their clients’ data, for example, and when they do the providers are typically prohibited from disclosing that this ever happened.

In response, some companies began listing warrant canaries on their websites—sentences stating that they had never yet been forced to reveal any client data. If at some point they did receive such a warrant, they could then remove the canary without violating their legal non-disclosure obligation, thereby allowing the public to gain indirect evidence about this otherwise-invisible surveillance.

Until recently, OpenAI succeeded at preventing hundreds of its former employees from ever being able to criticize them, and prevented most others—including many of their current employees!—from realizing this was even happening. After Kelsey Piper’s recent reporting, OpenAI sent emails to some former employees releasing them from their non-disparagement obligations. But given how few people have publicly confirmed being released so far, it seems likely these emails weren’t sent to everyone. And since the NDA covers the non-disparagement provision itself, it’s hard to be confident that someone has been released unless they clearly say so.

So we propose adopting non-disparagement canaries—if you are a former employee of OpenAI and aren’t subject to these obligations, you are welcome to leave a comment below (or email us), and we’ll update your entry on the spreadsheet. The more people do this, the more information we’ll have about who remains silenced.

[6/1/24: Jacob Hilton argues we interpreted the non-interference provision too broadly—that it was meant just to prohibit stealing OpenAI’s business relationships, not to more generally prohibit anything that would harm its business. We aren’t lawyers, and aren’t confident he’s wrong; if we come to think he’s right we’ll update the post].

^
You can read the full documents at the bottom of Kelsey Piper’s excellent report, but here are some key excerpts:
Non-Disclosure: “Employee agrees that Employee will now and forever keep the terms and monetary settlement amount of this Agreement completely confidential, and that Employee shall not disclose such to any other person directly or indirectly.”
Liability: “Employee agrees that the failure to comply with… the confidentiality, non-disparagement, non-competition, and non-solicitation obligations set forth in this Agreement shall amount to a material breach of this Agreement which will subject Employee to the liability for all damages OpenAI might incur.”
Non-Interference: “Employee agrees not to interfere with OpenAI’s relationship with current or prospective employees, current or previous founders, portfolio companies, suppliers, vendors or investors. Employee also agrees to refrain from communicating any disparaging, defamatory, libelous, or derogatory statements, in a manner reasonably calculated to harm OpenAI’s reputation, to any third party regarding OpenAI or any of the other Releasees.”
^
Thank you to AI Watch for providing some of this data.
^
In total there are 7 people who have publicly reported not being subject to the terms. Daniel Kokotajlo was offered the agreement but didn’t sign; Gretchen Krueger, Cullen O’Keefe, and Evan Hubinger are not subject to the agreement, either because they didn’t sign it or because it wasn’t offered to them.
^
Assuming former board members were expected to sign similar agreements, Helen Toner (Director of Strategy, Center for Security and Emerging Technology) may be subject to non-disparagement as well; Holden Karnofsky (Visiting Scholar, Carnegie Endowment for International Peace) confirms that he didn’t sign.
^
Edited to remove Chris Painter (Head of Policy, METR), Geoffrey Irving (Research Director, UK AI Safety Institute), and Remco Zwetsloot (Executive Director, Horizon Institute for Public Service), who report not signing the agreement; and Beth Barnes (Head of Research, METR), who reports being recently released.