RobertM comments on What prevents SB-1047 from triggering on deep fake porn/voice cloning fraud?

RobertM 27 Sep 2024 18:48 UTC
6 points
0
reasonably publicly accessible by an ordinary person from sources other than a covered model or covered model derivative
Seems like it’d pretty obviously cover information generated by non-covered models that are routinely used by many ordinary people (as open source image models currently are).
As a sidenote, I think the law is unfortunately one of those pretty cursed domains where it’s hard to be very confident of anything as a layman without doing a lot of your own research, and you can’t even look at experts speaking publicly on the subject since they’re often performing advocacy, rather than making unbiased predictions about outcomes. You could try to hire a lawyer for such advice, but it seems to be pretty hard to find lawyers who are comfortable giving their clients quantitative (probabilistic) and conditional estimates. Maybe this is better once you’re hiring for e.g. general counsel of a large org, or maybe large tech company CEOs have to deal with the same headaches that we do. Often your best option is to just get a basic understanding of how relevant parts of the legal system work, and then do a lot of research into e.g. relevant case law, and then sanity-check your reasoning and conclusions with an actual lawyer specialized in that domain.
- ChristianKl 29 Sep 2024 21:57 UTC
  2 points
  0
  Parent
  Deep fake porn of a particular person is not information that’s generated by non-covered models that are routinely used by many ordinary people even if the models could generate the porn if instructed to do so.
  - RobertM 30 Sep 2024 1:47 UTC
    2 points
    0
    Parent
    Almost no specific (interesting) output is information that’s already been generated by any model, in the strictest sense.
    - ChristianKl 30 Sep 2024 8:24 UTC
      2 points
      0
      Parent
      If I tell a model to write me a book summary, that book summary can be specific interesting output without containing any new information.
      If I want to know how to build a bomb, there are already plenty of sources out there on how to build a bomb. The information is already accessible from those sources. When an LLM synthesizes the existing information in its training data to help someone build a bomb it’s not inventing new information.
      Deep fakes aren’t about simply repeating information that’s already in the training data.
      So the argument would be that the lawmaker chose to say “accessible” because they want to allow LLMs to synthesize the existing information in their training data and repeat it back to the user but that does not mean that the lawmaker had an intention to allow the LLMs to produce new information that gets used to create harm even if there are other ways to create that information.