Meta Accused of Using Pirated Books for AI Training

The Meta AI logo is displayed in this image from September 28, 2023. Reuters

Sananda

January 10, 2025 Tags: Meta Copyright Infringement AI

Meta Platforms, the parent company of Facebook, stands accused by a group of authors of using pirated versions of copyrighted books to train its artificial intelligence systems, with approval from CEO Mark Zuckerberg. The authors, including prominent figures like Ta-Nehisi Coates and comedian Sarah Silverman, have filed a lawsuit against Meta for copyright infringement. These allegations, revealed in court papers made public this Wednesday, claim that internal Meta documents show the company was aware that the books it used were pirated.

The legal battle centers around the use of the large language model Llama, which Meta allegedly trained using a dataset that included millions of pirated works. The authors argue that Meta used copyrighted material, including their books, without permission. This lawsuit is part of a larger trend of authors, artists, and other creators taking legal action against tech companies for using their copyrighted works to train AI systems without consent.

In their latest filings, the authors have presented new evidence that suggests Meta used the AI training dataset LibGen. This dataset, they argue, contains pirated copies of their books and other copyrighted works, which Meta allegedly distributed through peer-to-peer torrents. The authors point to internal communications within Meta, showing that Zuckerberg had approved the use of the LibGen dataset, despite concerns within the company about its legitimacy. One message reportedly refers to LibGen as “a dataset we know to be pirated.”

Meta has not yet responded to requests for comment regarding these new allegations.

The case, initially filed in 2023, asserts that Meta used the authors' works in the creation of its AI systems without proper authorization, which the authors say constitutes copyright infringement. This legal battle is part of a broader wave of lawsuits against tech companies accused of improperly using copyrighted materials to develop AI tools. The defendants in these cases argue that their use of the copyrighted works falls under the concept of “fair use,” a legal principle that allows for the use of copyrighted material without permission in certain circumstances.

Last year, U.S. District Judge Vince Chhabria dismissed some of the claims in the authors' lawsuit. Specifically, he ruled that the text generated by Meta’s chatbots did not infringe the authors’ copyrights and that Meta had not unlawfully stripped their books of copyright management information (CMI). However, the authors have now filed an updated complaint, citing new evidence that they believe supports their claims of copyright infringement and justifies the revival of their previous CMI claims. They have also added a new claim related to computer fraud.

During a recent hearing, Judge Chhabria expressed his skepticism about the new claims but agreed to allow the authors to file an amended complaint. While he acknowledged the new evidence, he questioned whether the claims would hold up in court.

This ongoing legal case highlights growing concerns about how tech companies are using copyrighted material to train AI systems and the potential legal consequences they may face. As AI technology continues to evolve, the issue of intellectual property rights and how they apply to machine learning remains a contentious and unresolved matter.

Tags: Meta Copyright Infringement AI

POPULAR SEARCHS

Meta Accused of Using Pirated Books for AI Training

You may also like