Child porn is frequently used to justify all sorts of highly invasive privacy interventions. ChatGPT * seems to think it would be a public safety thread under Californian law.
Existing models can do pictures but not video. A complex multimodal model might be able to do video porn.
Better models might produce deep fake audio with less data and at nearer to how the person actually speaks.
There’s also the question of whether deep fake porn or faked audio is “accessible information” in the sense of that paragraph (2) (A). That paragraph clearly absolves a model if you can read how to build a bomb in a textbook that’s already existing.
ChatGPT * does seem to think that pictures and audio would fall under information but it’s less clear to me when it comes to the word “accessible”.
* I think ChatGPT has a much better understanding of Californian law than me, at the same time it might also be wrong and I’m happy to hear from someone with actual legal experience if ChatGPT interprets words wrong.
reasonably publicly accessible by an ordinary person from sources other than a covered model or covered model derivative
Seems like it’d pretty obviously cover information generated by non-covered models that are routinely used by many ordinary people (as open source image models currently are).
As a sidenote, I think the law is unfortunately one of those pretty cursed domains where it’s hard to be very confident of anything as a layman without doing a lot of your own research, and you can’t even look at experts speaking publicly on the subject since they’re often performing advocacy, rather than making unbiased predictions about outcomes. You could try to hire a lawyer for such advice, but it seems to be pretty hard to find lawyers who are comfortable giving their clients quantitative (probabilistic) and conditional estimates. Maybe this is better once you’re hiring for e.g. general counsel of a large org, or maybe large tech company CEOs have to deal with the same headaches that we do. Often your best option is to just get a basic understanding of how relevant parts of the legal system work, and then do a lot of research into e.g. relevant case law, and then sanity-check your reasoning and conclusions with an actual lawyer specialized in that domain.
Deep fake porn of a particular person is not information that’s generated by non-covered models that are routinely used by many ordinary people even if the models could generate the porn if instructed to do so.
If I tell a model to write me a book summary, that book summary can be specific interesting output without containing any new information.
If I want to know how to build a bomb, there are already plenty of sources out there on how to build a bomb. The information is already accessible from those sources. When an LLM synthesizes the existing information in its training data to help someone build a bomb it’s not inventing new information.
Deep fakes aren’t about simply repeating information that’s already in the training data.
So the argument would be that the lawmaker chose to say “accessible” because they want to allow LLMs to synthesize the existing information in their training data and repeat it back to the user but that does not mean that the lawmaker had an intention to allow the LLMs to produce new information that gets used to create harm even if there are other ways to create that information.
Child porn is frequently used to justify all sorts of highly invasive privacy interventions. ChatGPT * seems to think it would be a public safety thread under Californian law.
Existing models can do pictures but not video. A complex multimodal model might be able to do video porn.
Better models might produce deep fake audio with less data and at nearer to how the person actually speaks.
There’s also the question of whether deep fake porn or faked audio is “accessible information” in the sense of that paragraph (2) (A). That paragraph clearly absolves a model if you can read how to build a bomb in a textbook that’s already existing.
ChatGPT * does seem to think that pictures and audio would fall under information but it’s less clear to me when it comes to the word “accessible”.
* I think ChatGPT has a much better understanding of Californian law than me, at the same time it might also be wrong and I’m happy to hear from someone with actual legal experience if ChatGPT interprets words wrong.
Seems like it’d pretty obviously cover information generated by non-covered models that are routinely used by many ordinary people (as open source image models currently are).
As a sidenote, I think the law is unfortunately one of those pretty cursed domains where it’s hard to be very confident of anything as a layman without doing a lot of your own research, and you can’t even look at experts speaking publicly on the subject since they’re often performing advocacy, rather than making unbiased predictions about outcomes. You could try to hire a lawyer for such advice, but it seems to be pretty hard to find lawyers who are comfortable giving their clients quantitative (probabilistic) and conditional estimates. Maybe this is better once you’re hiring for e.g. general counsel of a large org, or maybe large tech company CEOs have to deal with the same headaches that we do. Often your best option is to just get a basic understanding of how relevant parts of the legal system work, and then do a lot of research into e.g. relevant case law, and then sanity-check your reasoning and conclusions with an actual lawyer specialized in that domain.
Deep fake porn of a particular person is not information that’s generated by non-covered models that are routinely used by many ordinary people even if the models could generate the porn if instructed to do so.
Almost no specific (interesting) output is information that’s already been generated by any model, in the strictest sense.
If I tell a model to write me a book summary, that book summary can be specific interesting output without containing any new information.
If I want to know how to build a bomb, there are already plenty of sources out there on how to build a bomb. The information is already accessible from those sources. When an LLM synthesizes the existing information in its training data to help someone build a bomb it’s not inventing new information.
Deep fakes aren’t about simply repeating information that’s already in the training data.
So the argument would be that the lawmaker chose to say “accessible” because they want to allow LLMs to synthesize the existing information in their training data and repeat it back to the user but that does not mean that the lawmaker had an intention to allow the LLMs to produce new information that gets used to create harm even if there are other ways to create that information.