If in the specific case of NYT articles the articles in question aren’t intended to be publicly accessible, then this isn’t just a copyright matter. But the OP doesn’t just say “there should be regulations to make it illegal to sneak around access restrictions in order to train AIs on material you don’t have access to”, it says there should be regulations to prohibit training AIs on copyrighted material. Which is to say, on pretty much any product of human creativity. And that’s a much broader claim.
Your description at the start of the second paragraph seems kinda tendentious. What does it have to do with anything that the process involves “arrays of numbers”? In what sense do these numbers “represent the work process behind the copyrighted material”? (And in what sense if any is that truer of AI systems than of human brains that learn from the same copyrighted material? My guess is that it’s much truer of the humans.) The bit about “increase the likelihood of … producing the copyrighted material” isn’t wrong exactly, but it’s misleading and I think you must know it: it’s the likelihood of producing the next token of that material given the context of all the previous tokens, and actually reproducing the input in bulk is very much not a goal.
It may well be true that all progress on AI is progress toward our doom, but it’s not obviously appropriate to go from that to “so we should pass laws that make it illegal to train AIs on copyrighted text”. That seems a bit like going from “Elon Musk’s politics are too right-wing for my taste and making him richer is bad” to “so we should ban electric vehicles” or from “the owner of this business is gay and I personally disapprove of same-sex relationships” to “so I should encourage people to boycott the business”. In each case, doing the thing may have the consequences you want, but it’s not an appropriate way to pursue those consequences.
If in the specific case of NYT articles the articles in question aren’t intended to be publicly accessible, then this isn’t just a copyright matter. But the OP doesn’t just say “there should be regulations to make it illegal to sneak around access restrictions in order to train AIs on material you don’t have access to”, it says there should be regulations to prohibit training AIs on copyrighted material. Which is to say, on pretty much any product of human creativity. And that’s a much broader claim.
Your description at the start of the second paragraph seems kinda tendentious. What does it have to do with anything that the process involves “arrays of numbers”? In what sense do these numbers “represent the work process behind the copyrighted material”? (And in what sense if any is that truer of AI systems than of human brains that learn from the same copyrighted material? My guess is that it’s much truer of the humans.) The bit about “increase the likelihood of … producing the copyrighted material” isn’t wrong exactly, but it’s misleading and I think you must know it: it’s the likelihood of producing the next token of that material given the context of all the previous tokens, and actually reproducing the input in bulk is very much not a goal.
It may well be true that all progress on AI is progress toward our doom, but it’s not obviously appropriate to go from that to “so we should pass laws that make it illegal to train AIs on copyrighted text”. That seems a bit like going from “Elon Musk’s politics are too right-wing for my taste and making him richer is bad” to “so we should ban electric vehicles” or from “the owner of this business is gay and I personally disapprove of same-sex relationships” to “so I should encourage people to boycott the business”. In each case, doing the thing may have the consequences you want, but it’s not an appropriate way to pursue those consequences.