The LibGen data set – what authors can do

Society of Authors: “Meta has used millions of pirated books to develop its AI programmes. Yesterday (20 March 2025), The Atlantic published a searchable database of over 7.5 million books and 81 million research papers. This data set, called Library Genesis or ‘LibGen’ for short, is full of pirated material, and all of it has been used to develop AI systems by tech giant Meta. The Atlantic says that court documents show that staff at Meta discussed licensing books and research papers lawfully but instead chose to use stolen work because it was faster and cheaper. Given that Meta Platforms, Inc, the parent company of Facebook, Instagram and WhatsApp, has a market capitalisation of £1.147 trillion, this is appalling behaviour. According to The Atlantic, Meta argued that it could then use the US’s ‘fair use exception’ defence if it was challenged legally. It is not yet clear whether scraping from copyright works without permission is unlawful under the US fair use exception to copyright, but if that scraping is for commercial purposes (which what Meta is doing surely is) it cannot be fair use. Under the UK fair dealing exception to copyright, there is no question that scraping is unlawful without permission. As a matter of urgency, Meta needs to compensate the rightsholders of all the works it has been exploiting. This is yet more evidence of the catastrophic impact generative AI is having on our creative industries worldwide. From development through to output, creators’ rights are being ignored, and governments need to intervene to protects authors’ rights. In the UK, and globally, we need to see strong legislation from governments to uphold and strengthen copyright law, ensure transparency and fair payment, and to penalise big tech companies who ride roughshod over the law.

Posted in: Copyright, Education, Internet, Knowledge Management, Legal Research, Libraries