New York Times sues Microsoft and OpenAI for impacting its business, claims generative AI models don't qualify for fair use

Windows Central · Dec 27, 2023

While regulators are already investigating the question of fair use regarding AI bots like ChatGPT, it seems the New York Times wants to press the issue in the courts instead.

New York Times sues Microsoft and OpenAI for impacting its business, claims generative AI models don't qualify for fair use : Read more

GraniteStateColin · Dec 27, 2023

Interesting subject. In general, I strongly favor enforcing copyrights and, when in doubt or when the law is murky, err in favor of the creator over those using the creator's work for rights. At the same time, I generally frown on large payments for damages, which plaintiffs and their attorneys tend to inflate beyond all reason.

I do think there's an irony here with the NYT in particular, the paper who fired its chief editor for allowing a conservative editorial (by a sitting Senator, no less) with nothing resembling parallel behavior for leftist editorials even by irreputable sources, per NYT's own admission (under the contradictory logic that it's good to hear from contrarian voices). For the NYT to now claim to have superior content (and for MS to treat it as such, if that accusation is true), seems comical.

The NYT does do news (and occasionally some good investigative reporting), but if there is any political angle, even a secondary one, the news and facts are heavily subordinated to the "narrative" that fits their far left political views.

Still, between my disdain for the NYT's bias and my support of copyright, for me, the copyright protection is more important. Whether we agree or disagree with someone, we should all want their rights to their own words and thoughts protected, just as we would want done with our own. IF (don't know the facts here) MS is using NYT writing without compensating the NYT to build the text composition for its chat responses, then the NYT is correct in demanding recompense. However, if it's just using the facts as others do who summarize or re-report the news, then that is not a copyright infringement.

An interesting case for sure. Another question worthy of debate on this: is this something that the courts should settle under existing copyright law, or is AI sufficiently unique it its application here that new legislation is needed, in which case, what should that legislation be?

fjtorres5591 · Dec 27, 2023

At least the NYT is honest in saying the lawsuit is a desperation money grab.
That it took until August to block the web crawler is either an indication of implied consent or technological cluelessness. Neither will be helpful against a good legal team.

Also noteworthy is their filing being in Manhattan, with its history of pro-publishing bias, rather than California or Settle, homes of OpenAI and Microsoft respectively.

Expect round one to be a fight over venue.

fdruid · Dec 28, 2023

This is pretty pathetic to read. Someone wants to make an example out of this.
But you just can't stop the future. We're not going back to buying paper news, nor paying for individual news outlets.

larakurst · Dec 28, 2023

Because of scale of these lawsuits, and who they're suing, and the way the tool has been so rapidly integrated into everything, I think that the more of these lawsuits to get added on increases the likelihood that it will be found to not be copyright infringements first of all, or they'll just have to make some minor modification to the tool and not pay anything, but they're not going to have to pay anything regardless. Or it'll be pocket change.

wojtek · Dec 28, 2023

fdruid said:
This is pretty pathetic to read. Someone wants to make an example out of this.
But you just can't stop the future. We're not going back to buying paper news, nor paying for individual news outlets.

You are aware that without this 'awful newspapers' (which provide a bulk of training material) the AI would be still quite stupid and wouldn't be able to "write" such article? Still, even now a lot of outlets outther uses AI and it's just painful to read.

Is sad that some are such blidaifhthed and hooked in yet another tech bandwagon that they want to ignore everything and prise their new "gold idol" without even batting an eyelash...

I'm annoyed by all the big corp that crawl and scrap the Web and then make bazzilions dollars withoufh creating proper content (I'm looking at you Google) and then claiming "everyone could see it so we just stole it"... ffs, how deprived of any morals you would have to be?

fjtorres5591 · Dec 28, 2023

wojtek said:
You are aware that without this 'awful newspapers' (which provide a bulk of training material) the AI would be still quite stupid and wouldn't be able to "write" such article? Still, even now a lot of outlets outther uses AI and it's just painful to read.

Is sad that some are such blidaifhthed and hooked in yet another tech bandwagon that they want to ignore everything and prise their new "gold idol" without even batting an eyelash...

I'm annoyed by all the big corp that crawl and scrap the Web and then make bazzilions dollars withoufh creating proper content (I'm looking at you Google) and then claiming "everyone could see it so we just stole it"... ffs, how deprived of any morals you would have to be?

Look to the law and precedent.
The Author's Guild vs Google found transformative use to be fair.
Web crawlers through publicly available websites is legal globally, by fair use in the US, by politician "permission" in the EU.

And most damning to the AI training handwringers, the Fair Use tests:

To determine whether a proposed use is a fair use, you must consider the following four factors:

Purpose: The purpose and character of the use, including whether such use is of a commercial nature, or is for nonprofit education purposes.
Nature: The nature of the copyrighted work.
Amount: The amount and substantiality of the portion used in relation to the copyrighted work as a whole.
Effect: The effect of the use upon the potential market for, or value of, the copyrighted work.

By precedent the tests resolve into two main tests: transformation and substitution. The first is about how the allegedly infringing product compares to the allegedly copied product. The second is about whether the alleged infringing product *itself* can substitute significantly for the copyrighted product itself in the market.

Note that, first, the NYT is suing for the training of LLM models, not the output of the models, which is what might have market impact. LLM models themselves aren't sold in liue of newspapers. That will have to factor in the substitution question.

Second, the damage they attribute to LLM model training, has been ongoing for decades, long before LLMs hit the mainstream, and in recent times particularly, their revenues have declined from reduced ad spending. That will have to factor into the claim of harm from the training of LLM models.

The NYT has spun one narrative that paints them as victims of a new tech.
A closer look at the facts suggests a different narrative, that they are operating a declining business with a product with a declining market and they are dsperately looking for a payout from a different business to minimize their ongoing losses.

This lawsuit strongly resembles the Google lawsuit that was ruled fair use in that it is about "scanning" content to create a technical product. Google scanned to create a database of citations to index the web, LLM training "scans" for language use and data to create a model of how humans perceive and interact with both.

In similar recent lawsuits against LLM training the judges have so far been skeptical of the plaintiff claims, dejanding specific examples of infringement.

The courts will decide but I doubt the Vegas odds makers will be favoring the NYT.

Those that don't like the current legal framework are free to petition the Congress for changes.

wojtek · Dec 28, 2023

fjtorres5591 said:
Look to the law and precedent.

Fortunately I'm not from a place that runs on precedent law.... phew.

fjtorres5591 said:
Web crawlers through publicly available websites is legal globally, by fair use in the US, by politician "permission" in the EU.

Erm... in a say way that any attempt at pushback at extortionist practices by our beloved Internet monopolies i.e. Canada's and Australia's publishers attempt at blocking google resulted in google removing them from the results... if only we weren't run by monopolies that built their position on stealing other works I wonder... but I guess it's all good and dandy as google is modern and publishers are all and musty and dying so it's ok to steal... clappity-clap

fjtorres5591 · Dec 28, 2023

wojtek said:
Fortunately I'm not from a place that runs on precedent law.... phew.

Erm... in a say way that any attempt at pushback at extortionist practices by our beloved Internet monopolies i.e. Canada's and Australia's publishers attempt at blocking google resulted in google removing them from the results... if only we weren't run by monopolies that built their position on stealing other works I wonder... but I guess it's all good and dandy as google is modern and publishers are all and musty and dying so it's ok to steal... clappity-clap

What exactly is stolen by LLM?
Can you unsubscribe from the NYT and get a news feed from ChatGPT?

This isn't a case of NYT whining about a competitor in their business taking away their customers (remember Tom Hanks' YOU'VE GOT MAIL) but about losing customers for whatever reason and trying to blame a random business somewhere else. The court may very well demand specific examples that don't involve the LLM model invoking a browser.

As to google, their business is built on indexing what is *freely* avaiable online and *sending* traffic to those sites.

Google has many sins to answer for, but the search engine itself isn't one. Paying billions to phone vendors to block alternatives? They're in court to answer for that.

Bear in mind that not only are crawlers legal, there is also a long established mechanism to prevent your "precious" from being visited by crawlers. It is called robots.txt. Look it up. In the NYT article it explicitly says they *didn't* block crawlers before august. So, either they had no objection to being visited or were operating a web site without understanding the basics. Neither is an excuse to be *retroactively* demanding a toll tax.

Again, the courts will speak soon enough.

fjtorres5591 · Dec 28, 2023

larakurst said:
Because of scale of these lawsuits, and who they're suing, and the way the tool has been so rapidly integrated into everything, I think that the more of these lawsuits to get added on increases the likelihood that it will be found to not be copyright infringements first of all, or they'll just have to make some minor modification to the tool and not pay anything, but they're not going to have to pay anything regardless. Or it'll be pocket change.

Apple, who is yet again late to the party, is reported to be talking to the NYC glass tower publishers to pay them to train a model off their books. The offer? $50M.

"Go 'way kid, don' bother me." -- Foghorn Leghorn.

That is about what the NYT might aspire to at best.
Not going to save them even in the unlikely case their friendly judge ignores precedent.

GraniteStateColin · Dec 31, 2023

@fjtorres5591 , @wojtek , @larakurst , @fdruid , interesting points and discussion above. Two points:

1. A judicial review of the law generally does not (and should never) care how many people are using something to determine if it's legal or not. Granted, 100M users may warrant more careful consideration than 100 users, but ultimately assessing who is protected and who is not should not be a function of the number of people on either side of that scale. Therefore, suggesting that "this is the future" or "this is good for XXXXX" should be irrelevant. It's either legal or it's not. (I would acknowledge, that there are some partial exceptions to this in, for example, the realm of mandatory patent licensing to cell phone manufacturers, but the patent owners are still paid for those licenses, more like the legal logic behind eminent domain.)

2. It is very possible that the law does need to be rewritten for AI use of copyrighted material. It is a fundamentally different consideration. I think the courts will hold that copyright holders have rights over their work. But in the past, where an explanatory summary (as opposed to a straight summary with sampling, like for Cliff Notes who do pay for rights to the original works) on the work of others has never been considered copyright infringement, the ability of AI to instantly rewrite something in entirely new words and pull in data from multiple sources would effectively render copyrights on nonfiction irrelevant, or, at best, effectively require nonfiction writers to use a peculiar style so customers are buying for the writer's personality rather than the content. Traditionally, this usage of prior works would have been considered standard research and would not have been copyright infringement, but in a world where AI can do that instantly and rewrite the work, the original people who put in the effort are prevented from monetizing their work product, which is the core reason for copyright law in the first place.

I don't think there is a simple answer to this, though hopefully we end up at a place where most of us can agree that the solution is fair in its protection of the intellectual property rights of the creators without merely raising new hurdles to productivity or holding back the useful advances in technology. One proposal I've heard (from Rick Beato, mostly on the music side of AI), is that it should not be possible to monetize AI creations -- use them if you want, but you can't then sell the product, which reduces the incentive to steal. Not sure how this plays into search engine advertising monetization (and recall the damage Napster did with free sharing of music), but it seems like a good foundation or starting point for driving the thought around this.

Search

New York Times sues Microsoft and OpenAI for impacting its business, claims generative AI models don't qualify for fair use

Windows Central

WinC Bot

GraniteStateColin

Active member

fjtorres5591

Active member

fdruid

Member

larakurst

New member

wojtek

Member

fjtorres5591

Active member

wojtek

Member

fjtorres5591

Active member

fjtorres5591

Active member

GraniteStateColin

Active member

Similar threads

Latest posts

Trending Posts

Members online

Forum statistics

Share this page