Linch comments on SB 1047: Final Takes and Also AB 3211

Linch 28 Aug 2024 5:46 UTC
2 points
0
Which the old version certainly would have done. The central thing the bill intends to do is to require effective watermarking for all AIs capable of fooling humans into thinking they are producing ‘real’ content, and labeling of all content everywhere.
OpenAI is known to have been sitting on a 99.9% effective (by their own measure) watermarking system for a year. They chose not to deploy it, because it would hurt their business – people want to turn in essays and write emails, and would rather the other person not know that ChatGPT wrote them.
As far as we know, no other company has similar technology. It makes sense that they would want to mandate watermarking everywhere.
Is watermarking actually really difficult? The overall concept seems straightforward, the most obvious ways to do it doesn’t require any fiddling with model internals, (so you don’t need to have AI expertise to do, or do expensive human work for your specific system like RLHF), and Scott Aaronson claims that a single OpenAI engineer was able to build a prototype pretty quickly.
I imagine if this becomes law some academics can probably hack together an open source solution quickly. So I’m skeptical that the regulatory capture angle could be particularly strong.
(I might be too optimistic about the engineering difficulties and amount of schlep needed, of course).
- Zvi 28 Aug 2024 11:58 UTC
  7 points
  3
  Parent
  If the academics can hack together an open source solution why haven’t they? Seems like it would be a highly cited, very popular paper. What’s the theory on why they don’t do it?
  - Linch 28 Aug 2024 22:02 UTC
    3 points
    0
    Parent
    Just spitballing, but it doesn’t seem theoretically interesting to academics unless they’re bringing something novel (algorithmically or in design) to the table, and practically not useful unless implemented widely, since it’s trivial for e.g. college students to use the least watermarked model.
  - Measure 28 Aug 2024 14:47 UTC
    2 points
    0
    Parent
    No one would use it if not forced to?
    - Zvi 28 Aug 2024 15:57 UTC
      4 points
      0
      Parent
      Two responses.
      One, even if no one used it, there would still be value in demonstrating it was possible—if academia only develops things people will adapt commercially right away then we might as well dissolve academia. This is a highly interesting and potentially important problem, people should be excited.
      Two, there would presumably at minimum be demand to give students (for example) access to a watermarked LLM, so they could benefit from it without being able to cheat. That’s even an academic motivation. And if the major labs won’t do it, someone can build a Llama version or what not for this, no?
- Davidmanheim 28 Aug 2024 7:22 UTC
  4 points
  2
  Parent
  Yeah, I think the simplest thing for image generation is for model hosting providers to use a separate tool—and lots of work on that already exists. (see, e.g., this, or this, or this, for different flavors.) And this is explicitly allowed by the bill.
  
  For text, it’s harder to do well, and you only get weak probabilistic identification, but it’s also easy to implement an Aaronson-like scheme, even if doing it really well is harder. (I say easy because I’m pretty sure I could do it myself, given, say, a month working with one of the LLM providers, and I’m wildly underqualified to do software dev like this.)