Accurate, Focused Research on Law, Technology and Knowledge Discovery Since 2002

Category Archives: Copyright

Condé Nast, other news orgs say AI firm stole articles, spit out “hallucinations”

Ars Technica: “Condé Nast and several other media companies sued the AI startup Cohere today, alleging that it engaged in “systematic copyright and trademark infringement” by using news articles to train its large language model. “Without permission or compensation, Cohere uses scraped copies of our articles, through training, real-time use, and in outputs, to power its artificial intelligence (‘AI’) service, which in turn competes with Publisher offerings and the emerging market for AI licensing,” said the lawsuit filed in US District Court for the Southern District of New York. “Not content with just stealing our works, Cohere also blatantly manufactures fake pieces and attributes them to us, misleading the public and tarnishing our brands.” Condé Nast, which owns Ars Technica and other publications such as Wired and The New Yorker, was joined in the lawsuit by The Atlantic, Forbes, The Guardian, Insider, the Los Angeles Times, McClatchy, Newsday, The Plain Dealer, Politico, The Republican, the Toronto Star, and Vox Media. The complaint seeks statutory damages of up to $150,000 under the Copyright Act for each infringed work, or an amount based on actual damages and Cohere’s profits. It also seeks “actual damages, Cohere’s profits, and statutory damages up to the maximum provided by law” for infringement of trademarks and “false designations of origin.” In Exhibit A, the plaintiffs identified over 4,000 articles in what they called an “illustrative and non-exhaustive list of works that Cohere has infringed.” Additional exhibits provide responses to queries and “hallucinations” that the publishers say infringe upon their copyrights and trademarks. The lawsuit said Cohere “passes off its own hallucinated articles as articles from Publishers.”..

Meta torrented over 81.7TB of pirated books to train AI, authors say

Ars Technica: “Newly unsealed emails allegedly provide the “most damning evidence” yet against Meta in a copyright case raised by book authors alleging that Meta illegally trained its AI models on pirated books. Last month, Meta admitted to torrenting a controversial large dataset known as LibGen, which includes tens of millions of pirated books. But… Continue Reading

Public Domain Image Archive

From The Public Domain Review – Explore our hand-picked collection of 10,046 out-of-copyright works, free for all to browse, download, and reuse. This is a living database with new images added every week. Users may Search the Collection, Browse the Categories. Enter “Infinite View” (via colossal) “While The Public Domain Review primarily takes the form… Continue Reading

US Copyright Office rules out copyright for AI-created content without human input

TechSpot: “The US Copyright Agency is publishing a series of reports about the relationship between copyright and AI. Despite the complexity of the issue, the organization has already said that AI-based works with no human intervention cannot enjoy copyright protection at all. Movies and other complex works created through AI means cannot be copyrighted, except… Continue Reading

Developer Creates Infinite Maze That Traps AI Training Bots

404 Media – “A pseudonymous coder has created and released an open source “tar pit” to indefinitely trap AI training web crawlers in an infinitely, randomly-generating series of pages to waste their time and computing power. The program, called Nepenthes after the genus of carnivorous pitcher plants which trap and consume their prey, can be… Continue Reading

Wikipedia:Database download

Wikipedia offers free copies of all available content to interested users. These databases can be used for mirroring, personal use, informal backups, offline use or database queries (such as for Wikipedia:Maintenance). All text content is licensed under the Creative Commons Attribution-ShareAlike 4.0 License (CC-BY-SA), and most is additionally licensed under the GNU Free Documentation License… Continue Reading

Strict Scrutiny

“A podcast about the United States Supreme Court and the legal culture that surrounds it. Hosted by three badass constitutional law professors– Leah Litman, Kate Shaw, and Melissa Murray– Strict Scrutiny provides in-depth, accessible, and irreverent analysis of the Supreme Court and its cases, culture, and personalities. Each week, Leah, Kate, and Melissa break down… Continue Reading

Meta Secretly Trained Its AI on a Notorious Piracy Database

Wired – [unpaywalled] Newly Unredacted Court Docs Reveal – One of the most important AI copyright legal battles just took a major turn : “Meta just lost a major fight in its ongoing legal battle with a group of authors suing the company for copyright infringement over how it trained its artificial intelligence models. Against… Continue Reading

Announcing the Public Domain Image Archive

“After the hundreds (thousands?) of hours trawling through online image collections since the PDR’s inception, we’ve decided it was time to create one of our own! We are really excited to share with you the launch of our new sister-project, the Public Domain Image Archive (PDIA), a curated collection of more than 10,000 out-of-copyright historical… Continue Reading

LLRX December 2024 Articles and Columns

December 2024 – LLRX.com® – the free web journal on law, technology, knowledge discovery and research for Librarians, Lawyers, Researchers, Academics, and Journalists. Founded in 1996. January 1, 2025 is Public Domain Day: Works from 1929 are open to all, as are sound recordings from 1924 – by Jennifer Jenkins. AI in Finance and Banking,… Continue Reading