Microsoft, OpenAI hit with fresh lawsuit

January 08, 2024 11:34 AM

OpenAI and Microsoft, its chief financial backer, have been hit with another lawsuit by a pair of nonfiction authors, Reuters reports.

On Friday, the authors sued OpenAI and Microsoft for allegedly misusing their work to train the AI models behind ChatGPT and other AI-based services.

Nicholas Basbanes and Nicholas Gage told the court that the companies infringed their copyrights by including their books as part of data used to train OpenAI’s GPT large language model.

The lawsuit comes after the New York Times became the first major US media organisation to sue OpenAI.

That lawsuit alleged that ChatGPT used millions of articles from the newspaper without permission to train the chatbot.

Simon Newcomb, a technology lawyer based in Australia and Partner at Clayton Utz believes that it seems hard to deny, based on the examples in the complaint from the New York Times that there is infringing output from the GPT model.

“However, the bigger issue is likely to be whether the creation of the model itself by collating the data and training a model with it is an infringement.

“That is a massive issue for the whole AI industry as it determines whether models can be created at all by training them on publicly available data without a licence.”

A fundamental legal point for courts in resolving this issue in the Times’ case and in other AI copyright cases will be whether creating AI models with third party data constitutes to “fair use” under US copyright law, Newcomb says.

“The Times’ complaint anticipates this argument by the defence and rejects it by saying: “Publicly, Defendants insist that their conduct is protected as "fair use" because their unlicensed use of copyrighted content to train GenAl models serves a new "transformative" purpose."

The Times' complaint goes on to add: "But there is nothing "transformative" about using The Times's content without payment to create products that substitute for The Times and steal audiences away from it. “Because the outputs of Defendants' GenAI models compete with and closely mimic the inputs used to train them, copying Times works for that purpose is not fair use.”

Newcomb writes that many US academics and commentators are arguing the opposite position on this issue – that training a model is fair use.

“With strong views on both sides, ultimately, a decision from a superior court is likely to be needed for the industry to accept a common position on this issue.”