The lawsuit says the defendants largely scrape the NYT’s original content to build their models and manufacture responses. “Defendants seek to free-ride on The Times’s massive investment in its journalism,” the complaint says, accusing OpenAI and Microsoft of using content “without payment to create products that substitute for The Times and steal audiences away from it”. Microsoft has a sizable investment in OpenAI.
This is a battle that could frame the legal contours around intellectual property (IP) rights in the age of generative AI platforms. It is also symbolic of the larger debate on how generative AI platforms could affect people from the creative industry, given that such systems are built on the back of work done by creators of original content, which is then synthesised through an algorithm and presented as fresh information by the AI systems.
Story continues below this ad
Earlier this year, two US authors had also sued OpenAI, claiming in a proposed class action that the company misused their works to “train” ChatGPT.
What is NYT’s main contention against OpenAI and Microsoft?
The lawsuit, filed in the Federal District Court in Manhattan, contends that millions of articles published by the publication were used to train automated chatbots which now compete with the news outlet as a source of reliable information.
NYT has reported that it approached Microsoft and OpenAI in April to raise concerns about the use of its intellectual property and explore “an amicable resolution,” possibly involving a commercial agreement and “technological guardrails” around generative AI products. But the talks had not produced a resolution, the publication said.
The publication also alleges that OpenAI and Microsoft’s large language models, which power ChatGPT and Copilot, “can generate output that recites Times content verbatim, closely summarises it, and mimics its expressive style.” This “undermine[s] and damage[s]” the Times’ relationship with readers, while also depriving it of “subscription, licensing, advertising, and affiliate revenue.”
Story continues below this ad
The “unlawful use” of the paper’s “copyrighted news articles, in-depth investigations, opinion pieces, reviews, how-to guides, and more” to create artificial intelligence products “threatens The Times’s ability to provide that service”, the lawsuit says.
The lawsuit highlights the potential damage to The Times’s brand through A.I. “hallucination”, a phenomenon in which chatbots respond with false information that is then wrongly attributed to a source.
In August, NYT had blocked OpenAI’s web crawler, preventing the company from using content from the publication to train its AI models.
AI and IP rights
Generative AI platforms such as ChatGPT and Google’s Bard have ignited a debate on IP rights over original content on the internet.
Story continues below this ad
The responses that AI platforms such as ChatGPT and Bard generate rest on the bedrock of millions of pieces of textual content that creators, including news publishers, have uploaded online.
The music business, too, is pushing back on the use of AI in the industry. Universal Music Group, for instance, has asked streaming services such as Spotify to stop developers from scraping its material to train AI bots in making new songs.
The debate is gaining traction at a time when countries around the world, including India, have archaic copyright laws that need reimagining keeping the AI wave in mind. For instance, in India, creative works are regulated under the Copyright Act of 1957.
The definition of an “author” under the Act includes any literary, dramatic, musical or artistic work which is computer generated, the person who causes the work to be created. But that definition does not take into account that AI systems do not generate information on their own. They are, simply, only as good as the base dataset on which they are trained. And the base dataset is made by copyrighted work produced by other authors.