What is SB 1047 *for*?

Emmett Shear asked on twitter:

I think SB 1047 has gotten much better from where it started. It no longer appears actively bad. But can someone who is pro-SB 1047 explain the specific chain of causal events where they think this bill becoming law results in an actual safer world? What’s the theory?

And I realized that AFAICT no one has concisely written up what the actual story for SB 1047 is supposed to be.

This is my current understanding. Other folk here may have more detailed thoughts or disagreements.

The bill isn’t sufficient on it’s own, but it’s not regulation for regulation’s sake because it’s specifically a piece of the regulatory machine I’d ultimately want built.

Right now, it mostly solidifies the safety processes that existing orgs have voluntarily committed to. But, we are pretty lucky that they voluntarily committed to them, and we don’t have any guarantee that they’ll stick with them in the future.

For the bill to succeed, we do need to invent good, third party auditing processes that are not just a bureaucratic sham. This is an important, big scientific problem that isn’t solved yet, and it’s going to be a big political problem to make sure that the ones that become consensus are good instead of regulatory-captured. But, figuring that out is one of the major goals of the AI safety community right now.

The “Evals Plan” as I understand it comes in two phase:

1. Dangerous Capability Evals. We invent evals that demonstrate a model is capable of dangerous things (including manipulation/scheming/deception-y things, and “invent bioweapons” type things)

As I understand it, this is pretty tractable, although labor intensive and “difficult” in a normal, boring way.

2. Robust Safety Evals. We invent evals that demonstrate that a model capable of scheming, is nonetheless safe – either because we’ve proven what sort of actions it will choose to take (AI Alignment), or, we’ve proven that we can control it even if it is scheming (AI control). AI control is probably easier at first, although limited.

As I understand it, this is very hard, and while we’re working on it it requires new breakthroughs.

The goal with SB 1047 as I understand is roughly:

First: Capability Evals trigger

By the time it triggers for the first time, we have a set of evals that are good enough to confirm “okay, this model isn’t actually capable of being dangerous” (and probably the AI developers continue unobstructed.

But, when we first hit a model capable of deception, self-propagation or bioweapon development, the eval will trigger “yep, this is dangerous.” And then the government will ask “okay, how do you know it’s not dangerous?”.

And the company will put forth some plan, or internal evaluation procedure, that (probably) sucks. And the Frontier Model Board will say “hey Attorny General, this plan sucks, here’s why.”

Now, the original version of SB 1047 would include the Attorney General saying “okay yeah your plan doesn’t make sense, you don’t get to build your model.” The newer version of the plan I think basically requires additional political work at this phase.

But, the goal of this phase, is to establish “hey, we have dangerous AI, and we don’t yet have the ability to reasonably demonstrate we can render it non-dangerous”, and stop development of AI until companies reasonably figure out some plans that at _least_ make enough sense to government officials.

Second: Advanced Evals are invented, and get woven into law

The way I expect a company to prove their AI is safe, despite having dangerous capabilities, is for third parties to invent the a robust version of the second set of evals, and then for new AIs to pass those evals.

This requires a set of scientific and political labor, and the hope is that by the time we’ve triggered the “dangerous” eval, the government is paying more explicit attention), and it makes it easier to have a conversation about what the longterm plan is.

SB 1047 is the specific tripwire by which the government will be forced to pay more attention at an important time.

My vague understanding atm is that Biden passed some similar-ish executive orders, but that there’s a decent chance Trump reverses them.

So SB 1047 may be the only safeguard we have for ensuring this conversation happens at the government level at the right time, even if future companies are even less safe-seeming than the current leading labs, or the current leading labs shortchange their current (relatively weak) pseudo-commitments.

Curious if anyone has different takes or more detailed knowledge.

See this Richard Ngo post on what makes a good eval, which I found helpful.