I’ve heard various people recently talking about how all the hubbub about artists’ work being used without permission to train AI makes it a good time to get regulations in place about use of data for training.
If you want to have a lot of counterfactual impact there, I think probably the highest-impact set of moves would be:
Figure out a technical solution to robustly tell whether a given image or text was used to train a given NN.
Bring that to the EA folks in DC. A robust technical test like that makes it pretty easy for them to attach a law/regulation to it. Without a technical test, much harder to make an actually-enforceable law/regulation.
In parallel, also open up a class-action lawsuit to directly sue companies using these models. Again, a technical solution to prove which data was actually used in training is the key piece here.
Model/generator behind this: given the active political salience, it probably wouldn’t be too hard to get some kind of regulation implemented. But by-default it would end up being something mostly symbolic, easily circumvented, and/or unenforceable in practice. A robust technical component, plus (crucially) actually bringing that robust technical component to the right lobbyist/regulator, is the main thing which would make a regulation actually do anything in practice.
Edit-to-add: also, the technical solution should ideally be an implementation of some method already published in some academic paper. Then when some lawyer or bureaucrat or whatever asks what it does and how we know it works, you can be like “look at this Official Academic Paper” and they will be like “ah, yes, it does Science, can’t argue with that”.
I’ve heard various people recently talking about how all the hubbub about artists’ work being used without permission to train AI makes it a good time to get regulations in place about use of data for training.
If you want to have a lot of counterfactual impact there, I think probably the highest-impact set of moves would be:
Figure out a technical solution to robustly tell whether a given image or text was used to train a given NN.
Bring that to the EA folks in DC. A robust technical test like that makes it pretty easy for them to attach a law/regulation to it. Without a technical test, much harder to make an actually-enforceable law/regulation.
In parallel, also open up a class-action lawsuit to directly sue companies using these models. Again, a technical solution to prove which data was actually used in training is the key piece here.
Model/generator behind this: given the active political salience, it probably wouldn’t be too hard to get some kind of regulation implemented. But by-default it would end up being something mostly symbolic, easily circumvented, and/or unenforceable in practice. A robust technical component, plus (crucially) actually bringing that robust technical component to the right lobbyist/regulator, is the main thing which would make a regulation actually do anything in practice.
Edit-to-add: also, the technical solution should ideally be an implementation of some method already published in some academic paper. Then when some lawyer or bureaucrat or whatever asks what it does and how we know it works, you can be like “look at this Official Academic Paper” and they will be like “ah, yes, it does Science, can’t argue with that”.