Training: it’s not clear to me whether training LLMs on copyrighted content is a copyright infringement under the current US copyright law. I think lawmakers should introduce regulations to make it an infringement, but I wouldn’t think the courts should consider it to be an infringement under the current laws (although I might not be familiar with all relevant case law).
My feel is that it could have been fair use as long as LLMs were just research projects, but then OpenAI started selling theirs as a product without changing their working model at all, and if you’re commercializing the model, it’s another story. Not sure where open models would lay here, but I still reckon probably copyright infringement since you’re not using the data only internally. Would like to hear an expert’s opinion on this.
I think to the extent LLMs don’t preserve the wording/the creative structure, copyright doesn’t provide protection; and some preservation of the structure might be fair use.
The problem is that this is a new case because it completely destroys the business model of these websites if you can have an AI agent visit them and then relay a summary to you—as it denies them the clicks (and ad visualizations) they need to pay themselves off. At which point odds are they’ll just lay on paywalls even harder if they’re not protected from this.
ChatGPT hallucinating false info and attributing it to NYT is outside copyright law, but seems bad and damaging.
I could imagine something like a defamation lawsuit? But it would probably have to focus on a specific case, not the general possibility of it? Again, hard to guess, this is all unexplored territory and new questions that never needed to be asked until now.
My feel is that it could have been fair use as long as LLMs were just research projects, but then OpenAI started selling theirs as a product without changing their working model at all, and if you’re commercializing the model, it’s another story. Not sure where open models would lay here, but I still reckon probably copyright infringement since you’re not using the data only internally. Would like to hear an expert’s opinion on this.
The problem is that this is a new case because it completely destroys the business model of these websites if you can have an AI agent visit them and then relay a summary to you—as it denies them the clicks (and ad visualizations) they need to pay themselves off. At which point odds are they’ll just lay on paywalls even harder if they’re not protected from this.
I could imagine something like a defamation lawsuit? But it would probably have to focus on a specific case, not the general possibility of it? Again, hard to guess, this is all unexplored territory and new questions that never needed to be asked until now.