Meta Accused of Using Pirated Data for AI Development

Plaintiffs in Kadrey et al. vs. Meta File Motion, Accuse Firm of Knowingly Using Copyrighted Works in AI Model Development

The plaintiffs in the case of Kadrey et al. vs. Meta have filed a motion alleging that the firm knowingly used copyrighted works in the development of its AI models. The motion, which was filed in the United States District Court in the Northern District of California, accuses Meta of systematically torrenting and stripping copyright management information (CMI) from pirated datasets, including works from the notorious shadow library LibGen.

Internal Memo Reveals LibGen’s True Nature

According to documents recently submitted to the court, evidence reveals highly incriminating practices involving Meta’s senior leaders. A December 2024 memo from internal Meta discussions acknowledged LibGen as "a dataset we know to be pirated," with debates arising about the ethical and legal ramifications of using such materials. Documents also revealed that top engineers hesitated to torrent the datasets, citing concerns about using corporate laptops for potentially unlawful activities.

Meta’s Practices Raise Concerns

The allegations against Meta paint a portrait of a company knowingly partaking in a widespread piracy scheme facilitated through torrenting. According to a string of emails included as exhibits, Meta engineers expressed concerns about the optics of torrenting pirated datasets from within corporate spaces. One engineer noted that "torrenting from a [Meta-owned] corporate laptop doesn’t feel right," but despite hesitation, the rapid downloading and distribution – or "seeding" – of pirated data took place.

Legal Ramifications

The case originally began as an intellectual property infringement action on behalf of authors and publishers claiming violations relating to AI use of their materials. However, the plaintiffs are now seeking to add two major claims to their suit: a violation of the Digital Millennium Copyright Act (DMCA) and a breach of the California Comprehensive Data Access and Fraud Act (CDAFA).

Impact on Emerging Legislation

The unfolding case of Kadrey et al. vs. Meta could have far-reaching ramifications for the development of AI models moving forward, potentially setting legal precedents in the US and beyond. At the heart of this expanding legal battle lies growing concern over the intersection of copyright law and AI. Plaintiffs argue that the stripping of copyright protections from textual datasets denies rightful compensation to copyright owners and allows Meta to build AI systems like Llama on the financial ruins of authors’ and publishers’ creative efforts.

Conclusion

The case of Kadrey et al. vs. Meta highlights the need for clearer guidance at an international level to protect both creators and innovators. As AI becomes the central focus of Meta’s future strategy, the allegations of reliance on pirated libraries are unlikely to help its ambitions of maintaining leadership in the field.

FAQs

Q: What is the case about?
A: The case is about Meta’s alleged use of copyrighted works in the development of its AI models.

Q: What is LibGen?
A: LibGen is a shadow library of pirated datasets.

Q: What are the allegations against Meta?
A: Meta is accused of systematically torrenting and stripping copyright management information (CMI) from pirated datasets, including works from LibGen.

Q: What are the potential implications of this case?
A: The case could have far-reaching ramifications for the development of AI models and the intersection of copyright law and AI.

Post Views: 61

Meta Accused of Using Pirated Data for AI Development

Generate single title from this title A New AI Model Could Help Scientists Design New Forms of Life in 100 -150 characters. And it...

Generate single title from this title Train CodeFu-7B with veRL and Ray on Amazon SageMaker Training jobs in 100 -150 characters. And it must...

Generate single title from this title Nearly half of high school students now use AI in college search in 100 -150 characters. And it...

Engineering confidence to navigate uncertainty | MIT News

Generate single title from this title Best of MWC 2026: Live updates on phones, concepts, and robots we’re seeing in 100 -150 characters. And...

Generate single title from this title A New AI Model Could Help Scientists Design New Forms of Life in 100 -150 characters. And it...

Generate single title from this title Train CodeFu-7B with veRL and Ray on Amazon SageMaker Training jobs in 100 -150 characters. And it must...

Generate single title from this title Nearly half of high school students now use AI in college search in 100 -150 characters. And it...

Engineering confidence to navigate uncertainty | MIT News

Generate single title from this title Best of MWC 2026: Live updates on phones, concepts, and robots we’re seeing in 100 -150 characters. And...

Featured video: Coding for underwater robotics | MIT News

Generate single title from this title Upgrading agentic AI for finance workflows in 100 -150 characters. And it must return only title i dont...

Generate single title from this title Making Softmax More Efficient with NVIDIA Blackwell Ultra in 100 -150 characters. And it must return only title...

LEAVE A REPLY Cancel reply

Latest

Generate single title from this title A New AI Model Could Help Scientists Design New Forms of Life in 100 -150 characters. And it...

Generate single title from this title Train CodeFu-7B with veRL and Ray on Amazon SageMaker Training jobs in 100 -150 characters. And it must...

Generate single title from this title Nearly half of high school students now use AI in college search in 100 -150 characters. And it...

Categories

Useful Links

Our Newsletter