Notwithstanding the tendentious assumption in the other comment thread that courts are maximally adversarial processes bent on on misreading legislation to achieve their perverted ends, I would bet that the relevant courts would not in fact rule that a bunch of deepfaked child porn counted as “Other grave harms to public safety and security that are of comparable severity to the harms described in subparagraphs (A) to (C), inclusive”, where those other things are “CBRN > mass casualties”, “cyberattack on critical infra”, and “autonomous action > mass casualties”. Happy to take such a bet at 2:1 odds.
But there are some simpler reason that particular hypothetical fails:
Image models are just not nearly as expensive to train, so it’s unlikely that they’d fall under the definition of a covered model to begin with.
Even if someone used a covered multimodal model, existing models can already do this.
See:
(2) “Critical harm” does not include any of the following:
(A) Harms caused or materially enabled by information that a covered model or covered model derivative outputs if the information is otherwise reasonably publicly accessible by an ordinary person from sources other than a covered model or covered model derivative.
I’m not sure if you intended the allusion to “the tendentious assumption in the other comment thread that courts are maximally adversarial processes bent on on misreading legislation to achieve their perverted ends”, but if it was aimed at the thread I commented on… what? IMO it is fair game to call out as false the claim that
It only counts if the $500m comes from “cyber attacks on critical infrastructure” or “with limited human oversight, intervention, or supervision....results in death, great bodily injury, property damage, or property loss.”
even if deepfake harms wouldn’t fall under this condition. Local validity matters.
I agree with you that deepfake harms are unlikely to be direct triggers for the bill’s provisions, for similar reasons as you mentioned.
Child porn is frequently used to justify all sorts of highly invasive privacy interventions. ChatGPT * seems to think it would be a public safety thread under Californian law.
Existing models can do pictures but not video. A complex multimodal model might be able to do video porn.
Better models might produce deep fake audio with less data and at nearer to how the person actually speaks.
There’s also the question of whether deep fake porn or faked audio is “accessible information” in the sense of that paragraph (2) (A). That paragraph clearly absolves a model if you can read how to build a bomb in a textbook that’s already existing.
ChatGPT * does seem to think that pictures and audio would fall under information but it’s less clear to me when it comes to the word “accessible”.
* I think ChatGPT has a much better understanding of Californian law than me, at the same time it might also be wrong and I’m happy to hear from someone with actual legal experience if ChatGPT interprets words wrong.
reasonably publicly accessible by an ordinary person from sources other than a covered model or covered model derivative
Seems like it’d pretty obviously cover information generated by non-covered models that are routinely used by many ordinary people (as open source image models currently are).
As a sidenote, I think the law is unfortunately one of those pretty cursed domains where it’s hard to be very confident of anything as a layman without doing a lot of your own research, and you can’t even look at experts speaking publicly on the subject since they’re often performing advocacy, rather than making unbiased predictions about outcomes. You could try to hire a lawyer for such advice, but it seems to be pretty hard to find lawyers who are comfortable giving their clients quantitative (probabilistic) and conditional estimates. Maybe this is better once you’re hiring for e.g. general counsel of a large org, or maybe large tech company CEOs have to deal with the same headaches that we do. Often your best option is to just get a basic understanding of how relevant parts of the legal system work, and then do a lot of research into e.g. relevant case law, and then sanity-check your reasoning and conclusions with an actual lawyer specialized in that domain.
Deep fake porn of a particular person is not information that’s generated by non-covered models that are routinely used by many ordinary people even if the models could generate the porn if instructed to do so.
If I tell a model to write me a book summary, that book summary can be specific interesting output without containing any new information.
If I want to know how to build a bomb, there are already plenty of sources out there on how to build a bomb. The information is already accessible from those sources. When an LLM synthesizes the existing information in its training data to help someone build a bomb it’s not inventing new information.
Deep fakes aren’t about simply repeating information that’s already in the training data.
So the argument would be that the lawmaker chose to say “accessible” because they want to allow LLMs to synthesize the existing information in their training data and repeat it back to the user but that does not mean that the lawmaker had an intention to allow the LLMs to produce new information that gets used to create harm even if there are other ways to create that information.
Notwithstanding the tendentious assumption in the other comment thread that courts are maximally adversarial processes bent on on misreading legislation to achieve their perverted ends, I would bet that the relevant courts would not in fact rule that a bunch of deepfaked child porn counted as “Other grave harms to public safety and security that are of comparable severity to the harms described in subparagraphs (A) to (C), inclusive”, where those other things are “CBRN > mass casualties”, “cyberattack on critical infra”, and “autonomous action > mass casualties”. Happy to take such a bet at 2:1 odds.
But there are some simpler reason that particular hypothetical fails:
Image models are just not nearly as expensive to train, so it’s unlikely that they’d fall under the definition of a covered model to begin with.
Even if someone used a covered multimodal model, existing models can already do this.
See:
I’m not sure if you intended the allusion to “the tendentious assumption in the other comment thread that courts are maximally adversarial processes bent on on misreading legislation to achieve their perverted ends”, but if it was aimed at the thread I commented on… what? IMO it is fair game to call out as false the claim that
even if deepfake harms wouldn’t fall under this condition. Local validity matters.
I agree with you that deepfake harms are unlikely to be direct triggers for the bill’s provisions, for similar reasons as you mentioned.
Not your particular comment on it, no.
Child porn is frequently used to justify all sorts of highly invasive privacy interventions. ChatGPT * seems to think it would be a public safety thread under Californian law.
Existing models can do pictures but not video. A complex multimodal model might be able to do video porn.
Better models might produce deep fake audio with less data and at nearer to how the person actually speaks.
There’s also the question of whether deep fake porn or faked audio is “accessible information” in the sense of that paragraph (2) (A). That paragraph clearly absolves a model if you can read how to build a bomb in a textbook that’s already existing.
ChatGPT * does seem to think that pictures and audio would fall under information but it’s less clear to me when it comes to the word “accessible”.
* I think ChatGPT has a much better understanding of Californian law than me, at the same time it might also be wrong and I’m happy to hear from someone with actual legal experience if ChatGPT interprets words wrong.
Seems like it’d pretty obviously cover information generated by non-covered models that are routinely used by many ordinary people (as open source image models currently are).
As a sidenote, I think the law is unfortunately one of those pretty cursed domains where it’s hard to be very confident of anything as a layman without doing a lot of your own research, and you can’t even look at experts speaking publicly on the subject since they’re often performing advocacy, rather than making unbiased predictions about outcomes. You could try to hire a lawyer for such advice, but it seems to be pretty hard to find lawyers who are comfortable giving their clients quantitative (probabilistic) and conditional estimates. Maybe this is better once you’re hiring for e.g. general counsel of a large org, or maybe large tech company CEOs have to deal with the same headaches that we do. Often your best option is to just get a basic understanding of how relevant parts of the legal system work, and then do a lot of research into e.g. relevant case law, and then sanity-check your reasoning and conclusions with an actual lawyer specialized in that domain.
Deep fake porn of a particular person is not information that’s generated by non-covered models that are routinely used by many ordinary people even if the models could generate the porn if instructed to do so.
Almost no specific (interesting) output is information that’s already been generated by any model, in the strictest sense.
If I tell a model to write me a book summary, that book summary can be specific interesting output without containing any new information.
If I want to know how to build a bomb, there are already plenty of sources out there on how to build a bomb. The information is already accessible from those sources. When an LLM synthesizes the existing information in its training data to help someone build a bomb it’s not inventing new information.
Deep fakes aren’t about simply repeating information that’s already in the training data.
So the argument would be that the lawmaker chose to say “accessible” because they want to allow LLMs to synthesize the existing information in their training data and repeat it back to the user but that does not mean that the lawmaker had an intention to allow the LLMs to produce new information that gets used to create harm even if there are other ways to create that information.