‘Torrenting From a Corporate Laptop Doesn’t Feel Right’
Ashley Belanger, reporting for Ars Technica on new details from an authors group lawsuit alleging Meta trained its AI models on a trove of pirated books: Last month, Meta admitted to torrenting a controversial large dataset known as LibGen, which includes tens of millions of pirated books. But details around the torrenting were murky until yesterday, when Meta’s unredacted emails were made public for the first time. The new evidence showed that Meta torrented “at least 81.7 terabytes of data across multiple shadow libraries through the site Anna’s Archive, including at least 35.7 terabytes of data from Z-Library and LibGen,” the authors’ court filing said. And “Meta also previously torrented 80.6 terabytes of data from LibGen.” Supposedly, Meta tried to conceal the seeding by not using Facebook servers while downloading the dataset to “avoid” the “risk” of anyone “tracing back the seeder/downloader” from Facebook servers, an internal message from Meta researcher Frank Zhang said, while describing the work as being in “stealth mode.” Meta also allegedly modified settings “so that the smallest amount of seeding possible could occur,” a Meta executive in charge of project management, Michael Clark, said in a deposition. Now that new information has come to light, authors claim that Meta staff involved in the decision to torrent LibGen must be deposed again because the new facts allegedly “contradict prior deposition testimony.” Mark Zuckerberg, for example, claimed to have no involvement in decisions to use LibGen to train AI models. But unredacted messages show the “decision to use LibGen occurred” after “a prior escalation to MZ,” authors alleged. Regardless of how you feel about AI training on public data, you have to be a zealot not to acknowledge that a lot of stuff falls into a gray zone. Torrenting 81 terabytes of pirated books is not in the gray zone. It’s hilarious to imagine Zuckerberg giving the OK the pirate all these books, just not from the office. ★
![‘Torrenting From a Corporate Laptop Doesn’t Feel Right’](https://cdn.arstechnica.net/wp-content/uploads/2025/02/GettyImages-459404787-1152x648.jpg)
Ashley Belanger, reporting for Ars Technica on new details from an authors group lawsuit alleging Meta trained its AI models on a trove of pirated books:
Last month, Meta admitted to torrenting a controversial large dataset known as LibGen, which includes tens of millions of pirated books. But details around the torrenting were murky until yesterday, when Meta’s unredacted emails were made public for the first time. The new evidence showed that Meta torrented “at least 81.7 terabytes of data across multiple shadow libraries through the site Anna’s Archive, including at least 35.7 terabytes of data from Z-Library and LibGen,” the authors’ court filing said. And “Meta also previously torrented 80.6 terabytes of data from LibGen.”
Supposedly, Meta tried to conceal the seeding by not using Facebook servers while downloading the dataset to “avoid” the “risk” of anyone “tracing back the seeder/downloader” from Facebook servers, an internal message from Meta researcher Frank Zhang said, while describing the work as being in “stealth mode.” Meta also allegedly modified settings “so that the smallest amount of seeding possible could occur,” a Meta executive in charge of project management, Michael Clark, said in a deposition.
Now that new information has come to light, authors claim that Meta staff involved in the decision to torrent LibGen must be deposed again because the new facts allegedly “contradict prior deposition testimony.” Mark Zuckerberg, for example, claimed to have no involvement in decisions to use LibGen to train AI models. But unredacted messages show the “decision to use LibGen occurred” after “a prior escalation to MZ,” authors alleged.
Regardless of how you feel about AI training on public data, you have to be a zealot not to acknowledge that a lot of stuff falls into a gray zone. Torrenting 81 terabytes of pirated books is not in the gray zone. It’s hilarious to imagine Zuckerberg giving the OK the pirate all these books, just not from the office.