OpenAI to grant authors access to training data in landmark copyright case

Full story

OpenAI will allow authors suing the company to inspect data used to train its artificial intelligence models in an ongoing copyright lawsuit. This marks the first time the AI firm has agreed to provide access to this information, potentially setting the stage for a pivotal legal battle over the use of copyrighted works in AI development.

The lawsuit, brought by authors including Sarah Silverman, Paul Tremblay and Ta-Nehisi Coates, alleges that OpenAI used their copyrighted works without permission to train its AI system, ChatGPT.

The authors claim their books were taken from online sources and used to generate summaries of their work.

Comedian Sarah Silverman and two other authors filed lawsuits against OpenAI and Meta, accusing the companies of copyright infringement. — Invision

As part of an agreement, OpenAI will allow the authors’ representatives to inspect the data at the company’s San Francisco office.

The review will take place under strict security measures, including a no-internet policy and the prohibition of recording devices. Reviewers must sign non-disclosure agreements and will have limited use of a computer for note-taking, under the supervision of OpenAI.

ChatGPT maker OpenAI said Thursday that it caught groups from Russia, China, Iran and Israel using its technology to try to influence political discourse around the world — Getty Images

The lawsuit is one of several high-profile cases against AI companies accused of using copyrighted material to train machine learning models.

OpenAI has previously stated that its systems are trained using publicly available datasets, which may include copyrighted works. The company may argue that this practice falls under fair use, a legal doctrine that allows limited use of copyrighted material under certain conditions.

A U.S. court previously dismissed some of the authors’ claims, including allegations of unfair business practices and negligence. However, their claim of direct copyright infringement remains active.

The outcome of the case could set important legal precedents for the future of AI and the use of copyrighted material in training data.

Tags: AI, Artificial Intelligence, ChatGPT, Copyright infringement, Copyright law, Generative AI, OpenAI, Sarah Silverman

Read Transcript

1:22 min listen