Re: Anthropic's suggested SB-1047 amendments

Link post

Note: I received a link to the letter from elsewhere, but it’s also cited in this SF Chronicle article, so I’m pretty confident it’s real. Thanks to @cfoster0 for the SF Chronicle link.

If you’re familiar with SB 1047, I recommend reading the letter in full; it’s only 7 pages.

I’ll go through their list of suggested changes and briefly analyze them, and then make a couple high-level points. (I am not a lawyer and nothing written here is legal advice.)

Major Changes

Greatly narrow the scope of pre-harm enforcement to focus solely on (a) failure to develop, publish, or implement an SSP^[1] (the content of which is up to the company); (b) companies making materially false statements about an SSP; (c) imminent, catastrophic risks to public safety.

Motivated by the following concern laid out earlier in the letter:

The current bill requires AI companies to design and implement SSPs that meet certain standards – for example they must include testing sufficient to provide a “reasonable assurance” that the AI system will not cause a catastrophe, and must “consider” yet-to-be-written guidance from state agencies. To enforce these standards, the state can sue AI companies for large penalties, even if no actual harm has occurred. While this approach might make sense in a more mature industry where best practices are known, AI safety is a nascent field where best practices are the subject of original scientific research. For example, despite a substantial effort from leaders in our company, including our CEO, to draft and refine Anthropic’s RSP over a number of months, applying it to our first product launch uncovered many ambiguities. Our RSP was also the first such policy in the industry, and it is less than a year old. What is needed in such a new environment is iteration and experimentation, not prescriptive enforcement. There is a substantial risk that the bill and state agencies will simply be wrong about what is actually effective in preventing catastrophic risk, leading to ineffective and/or burdensome compliance requirements.

While SB 1047 doesn’t prescribe object-level details for how companies need to evaluate models for their likelihood of causing critical harms, it does establish some requirements for the structure of such evalutions (22603(a)(3)).

Section 22603(a)(3)

(3) Implement a written and separate safety and security protocol that does all of the following:

(A) If a developer complies with the safety and security protocol, provides reasonable assurance that the developer will not produce a covered model or covered model derivative that poses an unreasonable risk of causing or enabling a critical harm.

(B) States compliance requirements in an objective manner and with sufficient detail and specificity to allow the developer or a third party to readily ascertain whether the requirements of the safety and security protocol have been followed.

(C) Identifies specific tests and test results that would be sufficient to provide reasonable assurance of both of the following:

That a covered model does not pose an unreasonable risk of causing or enabling a critical harm.
That covered model derivatives do not pose an unreasonable risk of causing or enabling a critical harm.

(D) Describes in detail how the testing procedure assesses the risks associated with post-training modifications.

(E) Describes in detail how the testing procedure addresses the possibility that a covered model can be used to make post-training modifications or create another covered model in a manner that may generate hazardous capabilities.

(F) Provides sufficient detail for third parties to replicate the testing procedure.

(G) Describes in detail how the developer will fulfill their obligations under this chapter.

(H) Describes in detail how the developer intends to implement the safeguards and requirements referenced in this section.

(I) Describes in detail the conditions under which a developer would enact a full shutdown.

(J) Describes in detail the procedure by which the safety and security protocol may be modified.

The current bill would allow the AG to “bring a civil action” to enforce any provision of the bill. One could look at the requirement to develop tests that provide a reasonable assurance that the covered model “does not pose an unreasonable risk of causing or enabling a critical harm”, and think that one of the potential benefits of the current bill is that if a company submits a grossly inadequate testing plan, the AG could take them to court (with a range of remedies which include model shutdown and deletion of weights). How likely is it that this benefit would be realized? Extremely unclear, and might depend substantially on the composition of the Frontier Model Division.

Removing this from the bill removes the main mechanism by which the bill hopes to be able to proactively prevent catastrophic harms. (Some harms are difficult to seek remedies for after the fact.) Of course, this is also the mechanism by which the government might impose unjustified economic costs.

Introduce a clause stating that if a catastrophic event does occur (which continues to be defined as mass casualties or more than $500M in damage), the quality of the company’s SSP should be a factor in determining whether the developer exercised “reasonable care.” This implements the notion of deterrence: companies have wide latitude in developing an SSP, but if a catastrophe happens in a way that is connected to a defect in a company’s SSP, then that company is more likely to be liable for it.

This is doing a lot of the heavy lifting as far as replacing previous mechanism for trying to mitigate catastrophic harms, but it’s not clear to me how the quality of the SSP is supposed to be determined (or by who). If it’s the courts, I’m not sure that’s better than an average counterfactual FMD determination. (I think it’s less likely that courts are explicitly captured, but they’re also ~guaranteed to not contain any domain experts.)

Eliminate the Frontier Model Division (Section 11547.6). With pre-harm enforcement sharply limited and no longer prescriptive about standards, the FMD is no longer needed. This greatly reduces the risk surface for ambiguity in how the bill is interpreted, and makes its effects more objective and predictable. In lieu of having an FMD, assign authority to the Government Operations Agency to raise the threshold (initially 10^26 FLOPS and >$100M) for covered models through a notice and comment process to further narrow the scope of covered models as we learn more about risk and safety characteristics of large models over time.

This makes sense as an extension of the first suggestion. If you’re going to switch to a tort-like incentive structure, there isn’t much point in having the Frontier Model Division.

Eliminate Section 22605 (uniform pricing for compute and AI models), which is unrelated to the primary goal of preventing catastrophic risks. It may have unintended consequences for market dynamics in the AI and cloud computing sectors.

This section is almost certainly just pork for Economic Security California Action (one of the bill’s three co-sponsors). It’s actually even worse than it sounds, since it seems to force anyone operating a compute cluster (as defined in the bill) to also sell access to it, even if they aren’t already a cloud provider, as well as requiring anyone selling model access to sell it in a way that doesn’t “engage in unlawful discrimination or noncompetitive activity in determining price or access”. All else equal I’d be happy to see this removed (or at least substantially amended), but don’t know how the realpolitik plays out.

Eliminate Section 22604 (know-your-customer for large cloud compute purchases), which duplicates existing federal requirements and is outside the scope of developer safety.

I don’t have a very confident take here. If it’s true that the proposed KYC rules duplicate existing federal requirements (and those federal requirements aren’t the result of a flimsy Executive Order that could get repealed by the next president), then getting rid of them seems fine. KYC is costly. In principle KYC isn’t necessary to give decisionmakers the ability to e.g. stop a training run, but in practice our government(s) might not be able to operate that way. Seems like a question that needs more analysis.

Narrow Section 22607 to focus on whistleblowing by employees that relates to false statements or noncompliance with the company’s SSP. Whistleblowing protections make sense and are common in federal and state law, but the language as drafted is too broad and could lead to spurious “whistleblowing” that leaks IP or disrupts companies for reasons unrelated or very tenuously related to catastrophic risk. False statements about an SSP are the area where proactive enforcement remains in our proposal, so it is logical that whistleblower protections focus on this area in order to aid with enforcement. The proposed changes are in line with, and are not intended to limit, existing whistleblower protections under California’s Labor Code.

The current bill would forbid developers of covered models (as well as their contractors and subcontractors) from preventing employees from disclosing information to the AG, “if the employee has reasonable cause to believe either of the following”:

(a) The developer is out of compliance with the requirements of Section 22603.

(b) An artificial intelligence model, including a model that is not a covered model, poses an unreasonable risk of causing or materially enabling critical harm, even if the employer is not out of compliance with any law.

The first major suggested change would eliminate much of 22603, so (a) would be less relevant, but (b) seems like it could be valuable in most possible worlds. I’m sympathetic to concerns about IP leaking, since that’s one way things might go badly wrong, but it’s pretty interesting to suggest that it’d be appropriate for a company to forbid employees from talking to the AG if they have a reasonable cause to believe that a model that company is working on poses an unreasonable risk of causing or enabling a critical harm. One line of reasoning might go something like, “well, we have a lot of employees, and in the limit it seems pretty likely that at least one of them will make a wildly incorrect judgment call about a model that everyone else at the company thinks is safe”. I think the solution to unilateralist’s-curse-type concerns is to figure out how to reduce the potential harm from such “false positive” disclosures.

Minor Changes

Lowering the expectations for completely precise and independently reproducible testing procedures. Our experience is that policies like SSPs are wet clay and companies are still learning and iterating rapidly on them—if we are overly prescriptive now, we risk “locking the industry in” to poor practices for the long-term. As frontier model training runs may last several months, it is also impractical to state comprehensively and reproducibly the details of all predeployment tests that will be run before initiating a months-long training run.

I’m not really sure I understand the first objection here. Is their claim that forcing labs to publish precise and reproducible testing procedures incurs a greater risk of the industry converging on the wrong testing procedures too early, compared to allowing labs to publish less precise and reproducible testing procedures? I can imagine that kind of convergence happening, but I’m not sure that it’s more likely if the published procedures are detailed enough to be reproducible.

I think I am less sympathetic to the second objection. It’s true that an “adequate” testing procedure would be fairly involved. But if you can’t publish a precise and reproducible procedure without doing a lot of additional work, I am skeptical that you can reliably execute that procedure yourself.

Removing a potential catch-22 where existing bill text could be interpreted as preventing external testing of a model before a model was tested.

If that’s indeed in the bill, seems good to remove. (I’ve read the bill and didn’t catch it, but there were a lot of issues that others caught and I didn’t.)

EDIT: seems like this is probably referring to section 22603(b)(1):

(b) Before using a covered model or covered model derivative, or making a covered model or covered model derivative available for commercial or public use, the developer of a covered model shall do all of the following:
(1) Assess whether the covered model is reasonably capable of causing or enabling a critical harm.

This might not literally be a catch-22, since you could in principle imagine methods of testing for model capabilities that don’t require inference (which is what I imagine is meant by “using”). But I don’t think that’s the intended reading and the wording should be clarified.

Removing mentions of criminal penalties or legal terms like “perjury” which are not essential to achieving the primary objectives of the legislation.

This is probably just a PR suggestion, since a lot of people have been freaking out about a pretty standard clause in the bill. In practice I mostly expected the clause to be a nothingburger, so I don’t feel terribly strongly about keeping it, but I do think the bill needs some way to enforce that companies are actually following their published SSPs.

Modifying the “critical harms” definition to clarify that military or intelligence operations in line with the national security objectives of the United States are excluded, and also to remove a vague catch-all critical harm provision. This prevents a company from being liable for authorized government use of force. There is room for debate about the use of AI for military and intelligence objectives. However, we believe the federal level, where responsibility lies for foreign and defense policy, rather than state governments, is the more appropriate forum for such a debate.

I am mostly not concerned about “intentional” harm. I don’t know which catch-all they’re referring to.

Requiring developers of covered models (>$100M) to publish a public version of an SSP, redacted as appropriate, and retain a copy for five years, in place of filing SSPs (and various other documents) with the FMD (which we have proposed eliminating, as noted above).

Compatible with their previous suggestions.

Removing all whistleblower requirements that refer to “any contractor or subcontractor” of the developer of a covered model. This would seem to include anything from data labelers to food vendors. We do not think this bill should introduce new requirements to such a wide swath of businesses, covering thousands to potentially hundreds of thousands of contractors and the contract company employees at large developers. The bill should focus on the direct employees of model developers. Existing whistleblower protections in the Labor Code only extend to employees.

This requirement does impose substantial costs for non-obvious benefits, if you’re mostly concerned about whistleblowers being able to report either concerns about SSPs not being followed, or more general concerns about catastrophic risks. There might be a concern about labs trying to play shell games with multiple entities, but on priors I don’t actually expect labs to both try and get away with setting up some kind of corporate structure such that the entity doing the training isn’t the entity that employs the researchers and engineers who would be best positioned to report their concerns. (I’m not that confident here, though.)

Other Thoughts

The letter doesn’t seem to be proposing the kinds of changes one might expect if averting existential risk were a major concern. In one sense, this isn’t surprising, since SB 1047 itself seemed somewhat confused on that question. But the AG’s ability to sue based on inadequate SSPs (before harm has occurred), reproducible testing plans, and broad whistleblower protections are provisions with trade-offs that make more sense if you’re trying to prevent an irrecoverable disaster.

I remain pretty uncertain about the sign of the overall bill in its current state. If all of the proposed changes were adopted, I’d expect the bill to have much less effect on the world (either positive or negative). Given my risk models I think more variance is probably good, so I’d probably take the gamble with the FMD, but I wouldn’t be that happy about it. I think section 22605 should be removed.

Many of the considerations here were brought up by others; credit goes substantially to them.

^
Safety and Security Protocols, as defined in the bill.

Re: Anthropic’s suggested SB-1047 amendments

Major Changes

Minor Changes

Other Thoughts