Meta’s Shadow Datasets: A Get-Out-of-Jail-Free Card?
The plaintiffs in a lawsuit against Meta, the authors of a book, have filed a motion to file a third amended complaint, alleging that Meta has treated the “public availability” of shadow datasets as a get-out-of-jail-free card.
A False Premise?
Meta’s opposition to the motion argues that the authors’ attempts to add additional claims to the case are an “eleventh-hour gambit based on a false and inflammatory premise” and denies that Meta waited to reveal crucial information in discovery.
LibGen: A Pirated Dataset
The plaintiffs allege that internal Meta records show every relevant decision-maker at Meta, up to and including its CEO, Mark Zuckerberg, knew LibGen was “a dataset we know to be pirated.” LibGen is an archive of books uploaded to the internet that originated in Russia around 2008 and is one of the largest and most controversial “shadow libraries” in the world.
Discovery Woes
Meta’s argument hinges on its claim that the plaintiffs already knew about the LibGen use and shouldn’t be granted additional time to file a third amended claim when they had ample time to do so before discovery ended in December 2024. However, the plaintiffs argue that the discovery process has unearthed reasons to add new allegations, including evidence that Meta uploaded pirated files containing the authors’ works on “torrent” sites.
A New Allegation
The unredacted documents claim that Meta, through a corporate representative who testified on November 20, 2024, has now admitted under oath to uploading pirated files containing the authors’ works on “torrent” sites. This, the plaintiffs argue, turned Meta itself into a distributor of the very same pirated copyrighted material that it was also downloading for use in its commercially available AI models.
Conclusion
The plaintiffs’ motion to file a third amended complaint alleges that Meta has treated the “public availability” of shadow datasets as a get-out-of-jail-free card. The discovery process has unearthed new evidence that Meta uploaded pirated files containing the authors’ works on “torrent” sites, turning Meta into a distributor of pirated copyrighted material. The outcome of this case remains to be seen, but it highlights the ongoing controversy surrounding shadow libraries and their use in AI development.
FAQs
Q: What is LibGen?
A: LibGen is an archive of books uploaded to the internet that originated in Russia around 2008 and is one of the largest and most controversial “shadow libraries” in the world.
Q: What is a shadow library?
A: A shadow library is a collection of copyrighted works that are uploaded to the internet without the permission of the copyright holders.
Q: What is the Digital Millennium Copyright Act (DMCA)?
A: The DMCA is a US law introduced in 1998 to stop people from selling or duplicating copyrighted works on the internet.
Q: What is the outcome of this case likely to be?
A: The outcome of this case remains to be seen, but it highlights the ongoing controversy surrounding shadow libraries and their use in AI development.

