Here are the extracts naming the companies and specific commitments :
As part of this commitment, President Biden is convening seven leading AI companies at the White House today – Amazon, Anthropic, Google, Inflection, Meta, Microsoft, and OpenAI – to announce that the Biden-Harris Administration has secured voluntary commitments from these companies to help move toward safe, secure, and transparent development of AI technology.
Today, these seven leading AI companies are committing to:
Ensuring Products are Safe Before Introducing Them to the Public
The companies commit to internal and external security testing of their AI systems before their release. This testing, which will be carried out in part by independent experts, guards against some of the most significant sources of AI risks, such as biosecurity and cybersecurity, as well as its broader societal effects.
The companies commit to sharing information across the industry and with governments, civil society, and academia on managing AI risks. This includes best practices for safety, information on attempts to circumvent safeguards, and technical collaboration.
Building Systems that Put Security First
The companies commit to investing in cybersecurity and insider threat safeguards to protect proprietary and unreleased model weights. These model weights are the most essential part of an AI system, and the companies agree that it is vital that the model weights be released only when intended and when security risks are considered.
The companies commit to facilitating third-party discovery and reporting of vulnerabilities in their AI systems. Some issues may persist even after an AI system is released and a robust reporting mechanism enables them to be found and fixed quickly.
Earning the Public’s Trust
The companies commit to developing robust technical mechanisms to ensure that users know when content is AI generated, such as a watermarking system. This action enables creativity with AI to flourish but reduces the dangers of fraud and deception.
The companies commit to publicly reporting their AI systems’ capabilities, limitations, and areas of appropriate and inappropriate use. This report will cover both security risks and societal risks, such as the effects on fairness and bias.
The companies commit to prioritizing research on the societal risks that AI systems can pose, including on avoiding harmful bias and discrimination, and protecting privacy. The track record of AI shows the insidiousness and prevalence of these dangers, and the companies commit to rolling out AI that mitigates them.
The companies commit to develop and deploy advanced AI systems to help address society’s greatest challenges. From cancer prevention to mitigating climate change to so much in between, AI—if properly managed—can contribute enormously to the prosperity, equality, and security of all.
As we advance this agenda at home, the Administration will work with allies and partners to establish a strong international framework to govern the development and use of AI. It has already consulted on the voluntary commitments with Australia, Brazil, Canada, Chile, France, Germany, India, Israel, Italy, Japan, Kenya, Mexico, the Netherlands, New Zealand, Nigeria, the Philippines, Singapore, South Korea, the UAE, and the UK. The United States seeks to ensure that these commitments support and complement Japan’s leadership of the G-7 Hiroshima Process—as a critical forum for developing shared principles for the governance of AI—as well as the United Kingdom’s leadership in hosting a Summit on AI Safety, and India’s leadership as Chair of the Global Partnership on AI. We also are discussing AI with the UN and Member States in various UN fora.
This overall seems like good news, with the exception of commitment 8) in the associated fact sheet:
This seems like a proactive commitment to build frontier models, which seems like to me the most relevant dimension (I don’t think the safety commitments will make a huge difference in whether systems of a certain capability level will actually kill everyone), and seems to actively commit these companies to develop stuff in the space.
My guess is it doesn’t make a huge difference, since the other commitments are fuzzy enough that any company that would want to slow down, would be able to slow down by saying they needed to in order to meet the other 7 commitments, but it still feels to me that the thing I most want to see are public commitments to slow down, and this does feel like a missed opportunity to do that.
I largely agree. But note that
doesn’t necessarily mean scary general-agents. E.g. “early cancer detection and prevention” is clearly non-scary, and in DeepMind’s recent blogpost on AI for climate change mitigation examples are weather forecasting, animal behavior forecasting, and energy efficiency.
From OpenAI’s post:
This seems like a big deal, in particular because it seems to foreshadow regulation:
OpenAI post with more details here.
I’m sorry, but I don’t see anything in there that meaningfully reduces my chances of being paperclipped. Not even if they were followed universally.
I don’t even see much that really reduces the chances of people (smart enough to act on them) getting bomb-making instructions almost as good as the ones freely available today, or of systems producing words or pictures that might hurt people emotionally (unless they get paperclipped first).
I do notice a lot of things that sound convenient for the commercial interests and business models of the people who were there to negotiate the list. And I notice that the list is pretty much a license to blast ahead on increasing capability, without any restrictions on how you get there. Including a provision that basically cheerleads for building anything at all that might be good for something.
There’s really only one concrete action in there involving the models themselves. The White House calls it “testing”, but OpenAI mutates it into “red-teaming”, which narrows it quite a bit. Not that anybody has any idea how to test any of this using any approach. And testing is NOT how you create secure, correct, or not-everyone-killing software. The stuff under the heading of “Building Systems that Put Security First”… isn’t. It’s about building an arbitrarily dangerous system and trying to put walls around it.
Just to organize this:
Summary: every point hugely advantages a small concentration of wealthy AI companies, and once these become legal requirements it will entrench them indefinitely. And it in no ways slows capabilities, in fact it seems to be implicitly giving permission to push them as far as possible.
The companies commit to internal and external security testing of their AI systems before their release. This testing, which will be carried out in part by independent experts, guards against some of the most significant sources of AI risks, such as biosecurity and cybersecurity, as well as its broader societal effects.
Effect on rich companies: they need to do extensive testing to deliver competitive products. Capabilities go hand in hand with reliability, every tool humanity uses that is capable is highly reliable.
Effect on poor companies : the testing burden prevents them from being able to gain early revenue on a shoddy product, preventing them from competing at all.
Effect on advancing capabilities : minimal
The companies commit to sharing information across the industry and with governments, civil society, and academia on managing AI risks. This includes best practices for safety, information on attempts to circumvent safeguards, and technical collaboration.
Effect on rich companies: they need to pay for another group of staff/internal automation tools to deliver these information sharing reports, carefully scripted to look good/not reveal more than the legal minimum.
Effect on poor companies : the reporting burden reduces their runway further, preventing all but a few extremely well funded startups from existing at all
Effect on advancing capabilities : minimal
The companies commit to investing in cybersecurity and insider threat safeguards to protect proprietary and unreleased model weights. These model weights are the most essential part of an AI system, and the companies agree that it is vital that the model weights be released only when intended and when security risks are considered.
Effect on rich companies: they already want to do this, this is how they protect their IP.
Effect on poor companies : the security burden reduces their runway further
Effect on advancing capabilities : minimal
The companies commit to facilitating third-party discovery and reporting of vulnerabilities in their AI systems. Some issues may persist even after an AI system is released and a robust reporting mechanism enables them to be found and fixed quickly.
This is the same as the reporting case
The companies commit to developing robust technical mechanisms to ensure that users know when content is AI generated, such as a watermarking system. This action enables creativity with AI to flourish but reduces the dangers of fraud and deception.
Effect on rich companies: they already want to avoid legal responsibility for use of AI in deception. Stripping the watermark puts the liability on the scammer.
Effect on poor companies : watermarks slightly reduce their runway
Effect on advancing capabilities : minimal
The companies commit to publicly reporting their AI systems’ capabilities, limitations, and areas of appropriate and inappropriate use. This report will cover both security risks and societal risks, such as the effects on fairness and bias.
This is another form of reporting. Same effects as above
The companies commit to prioritizing research on the societal risks that AI systems can pose, including on avoiding harmful bias and discrimination, and protecting privacy. The track record of AI shows the insidiousness and prevalence of these dangers, and the companies commit to rolling out AI that mitigates them.
Effect on rich companies: Now they need another internal group doing this research, for each AI company.
Effect on poor companies : Having to pay for another required internal group reduces their runway
Effect on advancing capabilities : minimal
The companies commit to develop and deploy advanced AI systems to help address society’s greatest challenges. From cancer prevention to mitigating climate change to so much in between, AI—if properly managed—can contribute enormously to the prosperity, equality, and security of all.
Effect on rich companies: This is carte blanche to do what they already were planning to do. Also, this says ‘fuck AI pauses’, direct from the Biden administration. GPT-4 is not nearly capable enough to solve any of “societies greatest challenges”. It’s missing modalities and general ability to get anything but the simplest tasks accomplished reliably. To add those additional components will take far more compute, such as the multi-exaflop AI supercomputers everyone is building that obviously will allow models that dwarf GPT-4.
Effect on poor companies : Well they aren’t competing with megamodels, but they already were screwed by the other points
Effect on advancing capabilities :
My quick take is that this is not that great, and I’ll explain why below.
First, the scope is quite bad, and I want to say why this is a bad scope below:
I consider this to be a bad scope, because it burdens much safer models with regulations while allowing RL, especially deep RL to mostly be unregulated, and I worry that this will set a dangerous precedent, primarily because I learned something important when reading porby’s posts: It is actually good that, by and large, generative models/simulators/predictors have come first, or at least it’s way easier than RL from an non-misuse extinction risk perspective since they have many nice safety features, like a lack of instrumental goals, due to it being densely informative, and in general much easier alignment targets.
I really hope this gets fixed, soon, because I fear that this will just place adversarial pressure on safe AI by default, especially with condition 8 below.
This seems like it’s throwing a bone to the e/acc inside them, at least in part. This is because they have the ability to justify arbitrary capabilities research if they can meet this commitment, and this seems like an indication that any regulation will probably not ban AI progress, or perhaps even slow down AI all that much, because of this here:
Condition 5 is impossible, and how they deal with that impossibility will plausibly determine how AI goes.
Here’s the condition below:
The reason I believe it is this study, and importantly gives various impossiblity results go:
https://arxiv.org/pdf/2303.11156.pdf
So the AI companies may have committed themselves to an impossible task. The question is, what’s going to happen next?
OpenAI post below:
https://openai.com/blog/moving-ai-governance-forward
About the impossibility result, if I understand correctly, that paper says two things (I’m simplifying and eliding a great deal):
You can take a recognizable, possibly watermarked output of one LLM, use a different LLM to paraphrase it, and not be able to detect the second LLM’s output as coming from (transforming) the first LLM.
In the limit, any classifier that tries to detect LLM output can be beaten by an LLM that is sufficiently good at generating human-like output. There’s evidence that a LLMs can soon become that good. And since emulating human output is an LLM’s main job, capabilities researchers and model developers will make them that good.
The second point is true but not directly relevant: OpenAI et al are committing not to make models whose output is indistinguishable from humans.
The first point is true, BUT the companies have not committed themselves to defeating it. Their own models’ output is clearly watermarked, and they will provide reliable tools to identify those watermarks. If someone else then provides a model that is good enough at paraphrasing to remove that watermark, that is that someone else’s fault, and they are effectively not abiding by this industry agreement.
If open source / widely available non-API-gated models become good enough at this to render the watermarks useless, then the commitment scheme will have failed. This is not surprising; if ungated models become good enough at anything contravening this scheme, it will have failed.
There are tacit but very necessary assumptions in this approach and it will fail if any of them break:
The ungated models released so far (eg llama) don’t contain forbidden capabilities, including output and/or paraphrasing that’s indistinguishable from human, but also of course notkillingeveryone, and won’t be improved to include them by ‘open source’ tinkering that doesn’t come from large industry players
No-one worldwide will release new more capable models, or sell ungated access to them, disobeying this industry agreement; and if they do, it will be enforced (somehow)
The inevitable use of more capable models, that would be illegal if released publicly, by some governments, militaries, etc. will not result in the public release of such capabilities; and also, their inevitable use of e.g. indistinguishable-from-human output will not cause such (public) problems that this commitment not to let private actors do it will become meaningless
A more recent paper shows that an equally strong model is not needed to break watermarks though paraphrasing. It suffices to have a quality oracle and a model that achieves equal quality with positive probability.
Uhm, fuck yeah?
If someone told me this 2 years a go I wouldn’t have believed it, kinda feels like we’re doing a Dr Strange on the timeline right now.