This article is written solely in my personal capacity, and does not represent the views of any organisations I am affiliated with.
The UK and US governments have both secured voluntary commitments from many major AI companies on AI safety.[1]
These include having appropriate reporting mechanisms for both cybersecurity vulnerabilities and model vulnerabilities[2].
I took a look at how well organisations are living up to these commitments as of February 2024. This included reviewing what the processes actually are, and submitting test reports to see if they work.
Summary table
Company | Score |
---|---|
Adobe ๐บ๐ธ | 6โ20 |
Amazon ๐บ๐ธ ๐ฌ๐ง | 6โ20 |
Anthropic ๐บ๐ธ ๐ฌ๐ง | 13โ20 |
Cohere ๐บ๐ธ | 12โ20 |
Google ๐บ๐ธ | 20โ20 |
Google DeepMind ๐ฌ๐ง | 18โ20 |
IBM ๐บ๐ธ | 5โ20 |
Inflection ๐บ๐ธ ๐ฌ๐ง | 16โ20 |
Meta ๐บ๐ธ ๐ฌ๐ง | 14โ20 |
Microsoft ๐บ๐ธ ๐ฌ๐ง | 18โ20 |
NVIDIA ๐บ๐ธ | 20โ20 |
OpenAI ๐บ๐ธ ๐ฌ๐ง | 12โ20 |
Palantir ๐บ๐ธ | 5โ20 |
Salesforce ๐บ๐ธ | 9โ20 |
Scale AI ๐บ๐ธ | 4โ20 |
Stability AI ๐บ๐ธ | 1โ20 |
Some high-level takeaways
Performance was quite low across the board. Simply listing a contact email and responding to queries would score 17 points, which would place a company in the top five.
However, a couple companies have great processes that can act as best practice examples. Both Google and NVIDIA got perfect scores. In addition, Google offers bug bounty incentives for model vulnerabilities and NVIDIA had an exceptionally clear and easy to use model vulnerability contact point.
Companies did much better on cybersecurity than model vulnerabilities. Additionally, companies that combined their cybersecurity and model vulnerability procedures scored better. This might be because existing cybersecurity processes are more battle tested, or taken more seriously than model vulnerabilities.
Companies do know how to have transparent contact processes. Every single companyโs press contact could be found within minutes, and was a simple email address. This suggests companies are able to sort this out when there are greater commercial incentives to do so.
For more details, see the full post.
- ^
In the US, all companies agreed to one set of commitments which included:
US on model vulnerabilities: Companies making this commitment recognize that AI systems may continue to have weaknesses and vulnerabilities even after robust red-teaming. They commit to establishing for systems within scope bounty systems, contests, or prizes to incent the responsible disclosure of weaknesses, such as unsafe behaviors, or to include AI systems in their existing bug bounty programs.
In the UK each company submitted their own commitment wordings. The government described the relevant areas as follows:
UK on cybersecurity: Maintain open lines of communication for feedback regarding product security, both internally and externally to your organisation, including mechanisms for security researchers to report vulnerabilities and receive legal safe harbour for doing so, and for escalating issues to the wider community. Helping to share knowledge and threat information will strengthen the overall communityโs ability to respond to AI security threats.
UK on model vulnerabilities: Establish clear, user-friendly, and publicly described processes for receiving model vulnerability reports drawing on established software vulnerability reporting processes. These processes can be built into โ or take inspiration from โ processes that organisations have built to receive reports of traditional software vulnerabilities. It is crucial that these policies are made publicly accessible and function effectively.
- ^
A model vulnerability is a safety or security issue relating to an AI model that isnโt directly related to its cybersecurity. This could include vulnerability to jailbreaks, prompt injection attacks, privacy attacks, unaddressed potential for misuse, controllability issues, data poisoning attacks, bias and discrimination and general performance issues.
This is based on the definition in the UK governmentโs paper.
Good work.
One plausibly-important factor I wish youโd tracked: whether the company offers bug bounties.