You are aware that without this 'awful newspapers' (which provide a bulk of training material) the AI would be still quite stupid and wouldn't be able to "write" such article? Still, even now a lot of outlets outther uses AI and it's just painful to read.
Is sad that some are such blidaifhthed and hooked in yet another tech bandwagon that they want to ignore everything and prise their new "gold idol" without even batting an eyelash...
I'm annoyed by all the big corp that crawl and scrap the Web and then make bazzilions dollars withoufh creating proper content (I'm looking at you Google) and then claiming "everyone could see it so we just stole it"... ffs, how deprived of any morals you would have to be?
Look to the law and precedent.
The Author's Guild vs Google found transformative use to be fair.
Web crawlers through publicly available websites is legal globally, by fair use in the US, by politician "permission" in the EU.
And most damning to the AI training handwringers, the Fair Use tests:
To determine whether a proposed use is a fair use, you must consider the following four factors:
- Purpose: The purpose and character of the use, including whether such use is of a commercial nature, or is for nonprofit education purposes.
- Nature: The nature of the copyrighted work.
- Amount: The amount and substantiality of the portion used in relation to the copyrighted work as a whole.
- Effect: The effect of the use upon the potential market for, or value of, the copyrighted work.
By precedent the tests resolve into two main tests: transformation and substitution. The first is about how the allegedly infringing product compares to the allegedly copied product. The second is about whether the alleged infringing product *itself* can substitute significantly for the copyrighted product itself in the market.
Note that, first, the NYT is suing for the training of LLM models, not the output of the models, which is what might have market impact. LLM models themselves aren't sold in liue of newspapers. That will have to factor in the substitution question.
Second, the damage they attribute to LLM model training, has been ongoing for decades, long before LLMs hit the mainstream, and in recent times particularly, their revenues have declined from reduced ad spending. That will have to factor into the claim of harm from the training of LLM models.
The NYT has spun one narrative that paints them as victims of a new tech.
A closer look at the facts suggests a different narrative, that they are operating a declining business with a product with a declining market and they are dsperately looking for a payout from a different business to minimize their ongoing losses.
This lawsuit strongly resembles the Google lawsuit that was ruled fair use in that it is about "scanning" content to create a technical product. Google scanned to create a database of citations to index the web, LLM training "scans" for language use and data to create a model of how humans perceive and interact with both.
In similar recent lawsuits against LLM training the judges have so far been skeptical of the plaintiff claims, dejanding specific examples of infringement.
The courts will decide but I doubt the Vegas odds makers will be favoring the NYT.
Those that don't like the current legal framework are free to petition the Congress for changes.