I’m not a lawyer and this is not legal advice, but I think the current US legal framework isn’t going to work to challenge training on publicly available data.
One argument that something is fair use is that it is transformative [1]. And taking an image or text and using it to slightly influence a giant matrix of numbers, in such a way that the original is not recoverable, and which allows new kinds of expression, seems likely to count as transformative.
So if you think that restricting access to public data for training purposes is a promising approach [2], you should probably focus on trying to create a new regulatory framework.
Having said that, this is all US analysis. Other countries have other frameworks and may not have exact analogs of fair use. Perhaps in the EU legal challenges are more viable.
[2] You should think about what the side effects would be like. For instance, this will advantage giant companies that can pay to license or create data, and places that have low respect for law. Whether that’s desirable is worth thinking through.
From the overview, it seems like there’s one Supreme court case and the other cases are from lower courts.
Transformativeness is only one of the factors in the Supreme court case.
The supreme court case does say:
As to the music, this Court expresses no opinion whether repetition of the bass riff is excessive copying, but remands to permit evaluation of the amount taken, in light of the song’s parodic purpose and character, its transformative elements, and considerations of the potential for market substitution.
(f) The Court of Appeals erred in resolving the fourth §107 factor, “the effect of the use upon the potential market for or value of the copyrighted work,” by presuming, in reliance on Sony, supra, at 451, the likelihood of significant market harm based on 2 Live Crew’s use for commercial gain. No “presumption” or inference of market harm that might find support in Sony is applicable to a case involving something beyond mere duplication for commercial purposes. The cognizable harm is market substitution, not any harm from criticism. As to parody pure and simple, it is unlikely that the work will act as a substitute for the original, since the two works usually serve different market functions. The fourth factor requires courts also to consider the potential market for derivative works. See, e. g., Harper & Row, supra, at 568. If the later work has cognizable substitution effects in protectable markets for derivative works, the law will look beyond the criticism to the work’s other elements. 2 Live Crew’s song comprises not only parody but also rap music. The absence of evidence or affidavits addressing the effect of 2 Live Crew’s song on the derivative market for a nonparody, rap version of “Oh, Pretty Woman” disentitled 2 Live Crew, as the proponent of the affirmative defense of fair use, to summary judgment. Pp. 20-25.
If you would sue Dalle 2, you would likely argue that Dalle 2 does create market harm by creating competition.
Creating competition doesn’t count as harm—it has to be direct substitution for the work in question. That’s a pretty high bar.
Also there are things like stable diffusion which arguably aren’t commercial (the code and model are free), which further undercuts the commercial purpose angle.
I’m not saying any of this is dispositive—that’s the nature of balancing tests. I think this is going to be a tough row to hoe though, and certainly not a slam dunk to say that copyright should prevent ML training on publicly available data.
Let’s say an artist draws images in a very unique style. Afterwards, a lot of Dalle 2 images get created in the same style. That would make the style less unique and less valuable.
It’s plausible financial harm in a way that doesn’t exist in the case on which the Supreme Court ruled.
I don’t believe it’s a slam dunk either. I do believe there’s room for the Supreme Court to decide either way. The fact that it’s not a slam dunk either way suggests that spending money on making a stronger legal argument is valuable.
I’m not a lawyer and this is not legal advice, but I think the current US legal framework isn’t going to work to challenge training on publicly available data.
One argument that something is fair use is that it is transformative [1]. And taking an image or text and using it to slightly influence a giant matrix of numbers, in such a way that the original is not recoverable, and which allows new kinds of expression, seems likely to count as transformative.
So if you think that restricting access to public data for training purposes is a promising approach [2], you should probably focus on trying to create a new regulatory framework.
Having said that, this is all US analysis. Other countries have other frameworks and may not have exact analogs of fair use. Perhaps in the EU legal challenges are more viable.
[1] https://www.nolo.com/legal-encyclopedia/fair-use-what-transformative.html
[2] You should think about what the side effects would be like. For instance, this will advantage giant companies that can pay to license or create data, and places that have low respect for law. Whether that’s desirable is worth thinking through.
From the overview, it seems like there’s one Supreme court case and the other cases are from lower courts.
Transformativeness is only one of the factors in the Supreme court case.
The supreme court case does say:
If you would sue Dalle 2, you would likely argue that Dalle 2 does create market harm by creating competition.
Creating competition doesn’t count as harm—it has to be direct substitution for the work in question. That’s a pretty high bar.
Also there are things like stable diffusion which arguably aren’t commercial (the code and model are free), which further undercuts the commercial purpose angle.
I’m not saying any of this is dispositive—that’s the nature of balancing tests. I think this is going to be a tough row to hoe though, and certainly not a slam dunk to say that copyright should prevent ML training on publicly available data.
(Still not a lawyer, still not legal advice!)
Let’s say an artist draws images in a very unique style. Afterwards, a lot of Dalle 2 images get created in the same style. That would make the style less unique and less valuable.
It’s plausible financial harm in a way that doesn’t exist in the case on which the Supreme Court ruled.
I don’t believe it’s a slam dunk either. I do believe there’s room for the Supreme Court to decide either way. The fact that it’s not a slam dunk either way suggests that spending money on making a stronger legal argument is valuable.
You can’t copyright a style.