My takes on SB-1047

I recently decided to sign a letter of support for SB 1047. Before deciding whether to do so, I felt it was important for me to develop an independent opinion on whether the bill was good, as opposed to deferring to the opinions of those around me, so I read through the full text of SB 1047. After forming my opinion, I checked my understanding of tort law basics (definitions of “reasonable care” and “materially contribute”) with a law professor who was recommended to me by one of the SB 1047 sponsors, but who was not directly involved in the drafting or lobbying for the bill. Ideally I would have wanted to consult with a completely independent lawyer, but this would have been prohibitively expensive and difficult on a tight timeline. This post outlines my current understanding. It is not legal advice.

My main impression of the final version of SB 1047 is that it is quite mild. Its obligations only cover models trained with $100M+ of compute, or finetuned with $10M+ of compute. ^[1] If a developer is training a covered model, they have to write an SSP, that explains why they believe it is not possible to use the model (or a post-train/finetune of the model costing <$10M of compute) to cause critical harm ($500M+ in damage or mass casualties). This would involve running evals, doing red teaming, etc. The SSP also has to describe what circumstances would cause the developer to decide to shut down training and any copies of the model that the developer controls, and how they will ensure that they can actually do so if needed. Finally, a redacted copy of the SSP must be made available to the public (and an unredacted copy filed with the Attorney General). This doesn’t seem super burdensome, and is very similar to what labs are already doing voluntarily, but it seems good to codify these things because otherwise labs could stop doing them in the future. Also, current SSPs don’t make hard commitments about when to actually stop training, so it would be good to have that.

If a critical harm happens, then the question for determining penalties is whether the developer met their duty to exercise “reasonable care” to prevent models from “materially contributing” to the critical harm. This is determined by looking at how good the SSP was (both in an absolute sense and when compared to other developers) and how closely it was adhered to in practice.

Reasonable care is a well-established concept in tort law that basically means you did a cost benefit analysis that a reasonable person would have done. Importantly, it doesn’t mean the developer has to be absolutely certain that nothing bad can happen. For example, suppose you release an open source model after doing dangerous capabilities evals to make sure it can’t make a bioweapon,^[2] but then a few years later a breakthrough in scaffolding methods happens and someone makes a bioweapon using your model—as long as you were thorough in your dangerous capabilities evals you would not be liable, because it would not have been reasonable for you to anticipate that someone would make a breakthrough that invalidates your evaluations. Also, if mitigating the risk would be too costly, and the benefit of releasing the model far outweighs the risks of release, this is also a valid reason not to mitigate the risk under the standard of reasonable care (e.g the benefits of driving a car at a normal speed far outweigh the costs of car accidents; so reasonable care doesn’t require driving at 2 mph to fully mitigate the risk of car accidents). My personal opinion is I think the reasonable care standard is too weak to prevent AI from killing everyone. However, this also means that I think people opposing the current version of the bill because of the reasonable care requirement are overreacting.

Materially contributing is not as well-established a concept but my understanding is it means the model can’t just merely be helpful in causing critical harm, but rather it must be that the model was strongly counterfactual; the critical harm would not have happened without the existence of the model. In addition, the bill also explicitly clarifies that cases where the model provides information that was publicly accessible anyways don’t count. So for example, if a terrorist uses a model to make a bioweapon, and the model provides the same advice as google, then this doesn’t count; if it cuts the cost in half by providing more useful information than the internet, then it probably still doesn’t count, since a determined terrorist wouldn’t be deterred merely by a 2x in cost; if it cuts the cost by 100x, it probably does count; if it provides advice that couldn’t have been gotten from a human expert because all the human experts out there have moral scruples and don’t want to help make a bioweapon, it probably also counts.

It doesn’t affect near-term open source models, simply because they will not be powerful enough to materially contribute to critical harm. In the longer term, once models can contribute to critical harm if jailbroken, it seems very hard to ensure that safeguards cannot be removed from open source models with up to $10M of compute, even just with known attacks. But it seems pretty reasonable to me to not release models which (a) can do $500M of damage or cause mass casualties if safeguards are removed, (b) safeguards can be removed for <$10M with already-known attacks, and (c) where the benefits of releasing the unmitigated model do not outweigh the costs from critical harm.

There are also some provisions for whistleblower protection. Employees cannot be punished for disclosing information to the AG or the Labor Commissioner. There also needs to be an anonymous hotline for reporting information to directors/officers. This seems pretty reasonable.

While I do think federal regulation would be preferable, I’m not very sympathetic to the argument that SB 1047 should not be passed because federal regulation would be better. It seems likely that passing a federal bill similar to SB 1047 gets harder if SB 1047 fails. Also, I don’t think this regulation splits talent between the federal and state level: aside from the Board of Frontier Models, which mostly just sets the compute thresholds, this version of SB 1047 does not create any substantial new regulatory bodies. The prospect for federal regulation seems quite uncertain, especially if a Trump presidency happens. And once strong federal regulation does pass, SB 1047 can degrade gracefully into mostly deferring to federal regulatory bodies.

I don’t think SB 1047 is nearly sufficient to prevent catastrophic risk, though it is a step in the right direction. So I think a lot of its impact will be through how it affects future AI regulation. My guess is if SB 1047 passes, this probably creates more momentum for future AI regulation. (Also, it would be in effect some number of years earlier than federal regulation — this is especially relevant if you have shorter timelines than me.)

^
There are also FLOP count thresholds specified in the bill (1e26, 3e25 respectively), but they’re not too far off from these dollar amounts, and compute quickly gets cheaper. This threshold can be raised higher in the future by the BFM, but not lowered below $100M/$10M respectively, so the BFM could completely neuter the bill if it so chose.
^
I use bioweapons as the prototypical example because it’s straightforward to reason through, but AIs helping terrorists make bioweapons is actually not really my modal story of how AIs cause catastrophic harm.