Sarah Silverman sues OpenAI, Meta for copyright infringement

July 11, 2023

Karah Rucker, Anchor/Reporter

Full story

Comedian and author Sarah Silverman, along with two other authors, filed lawsuits against OpenAI and Meta, accusing the companies of copyright infringement. The lawsuits accused the companies of using the authors’ content without permission to train artificial intelligence language models.

“On information and belief, to train the OpenAI Language Models, OpenAI relied on harvesting mass quantities of textual material from the public internet, including Plaintiffs’ books, which are available in digital formats,” the lawsuit against OpenAI, filed on Friday, July 7 read. “Because the OpenAI Language Models cannot function without the expressive information extracted from Plaintiffs’ works (and others) and retained inside them, the OpenAI Language Models are themselves infringing derivative works, made without Plaintiffs’ permission and in violation of their exclusive rights under the Copyright Act.”

The copyright infringement lawsuits against OpenAI and Meta accuse the companies of making copies of Silverman’s and the other authors’ works by scraping them from illegal “shadow libraries.” The libraries contain the texts of thousands of books, including Silverman’s “The Bedwetter” as well as plaintiff Christopher Golden’s book “Ararat” and plaintiff Richard Kadrey’s book “Sandman Slime.”

“Plaintiffs are entitled to statutory damages, actual damages, restitution of profits, and other remedies provided by law,” the lawsuit read.

The lawsuit against Meta cites the company’s own research paper about LLaMA, the large-language model it uses to train chatbots. According to the paper, made public in February, scientists included text from The Pile within their training dataset. The lawsuit says some of that text comes from shadow libraries.

Less is known about the source of training datasets for OpenAI’s ChatGPT program. But the lawsuit states that ChatGPT’s ability to generate summaries of the plaintiffs’ works is “only possible if ChatGPT was trained on Plaintiffs’ copyrighted works.”