Category «Copyright»

Search Millions of YouTube Videos Used to Train Generative AI

The Atlantic [no paywall] – “Editor’s note: This search tool is part of The Atlantic’s investigation into how YouTube videos are taken to train AI tools. You can read an analysis about these data sets here. This work is part of AI Watchdog, The Atlantic’s ongoing investigation into the generative-AI industry. (A note for users: …

Subjects: AI, Copyright, Internet, Search Engines

“First of its kind” AI settlement: Anthropic to pay authors $1.5 billion

Ars Technica: “Settlement shows AI companies can face consequences for pirated training data. Authors revealed today that Anthropic agreed to pay $1.5 billion and destroy all copies of the books the AI company pirated to train its artificial intelligence models. In a press release provided to Ars, the authors confirmed that the settlement is “believed …

Subjects: AI, Copyright, Courts, Internet, Legal Research, Libraries

AI web crawlers are destroying websites in their never-ending hunger for any and all content

The Register: “With AI’s rise, AI web crawlers are strip-mining the web in their perpetual hunt for ever more content to feed into their Large Language Model (LLM) mills. How much traffic do they account for? According to Cloudflare, a major content delivery network (CDN) force, 30% of global web traffic now comes from bots. …

Subjects: AI, Copyright, Cybercrime, Cybersecurity, Internet, Knowledge Management, Legal Research, Search Engines

The Interactive GenAI Legal Hallucination Tracker

“Coming Soon: The Interactive GenAI Legal Hallucination Tracker — Sneak Peek Today! August 10, 2025 by Jenny Wondracek – “If you follow me on LinkedIn or spoke with me at AALL, you’ve probably seen me teasing this project like it’s the season finale of a legal tech drama. Well, the wait is (almost) over — …

Subjects: AI, Copyright, Government Documents, Intellectual Property, Internet, Knowledge Management, Legal Research, Search Engines

AI industry horrified to face largest copyright class action ever certified

Ars Technica: “AI industry groups are urging an appeals court to block what they say is the largest copyright class action ever certified. They’ve warned that a single lawsuit raised by three authors over Anthropic’s AI training now threatens to “financially ruin” the entire AI industry if up to 7 million claimants end up joining …

Subjects: AI, Copyright, Courts, Legal Research

The Black Market for Fake Science Is Growing Faster Than Legitimate Research, Study Warns

Wired no paywall – “A new study by researchers at Northwestern University has set off alarm bells about the future of academic research, warning that the publication of fraudulent science is growing at a faster rate than that of legitimate research. Over the last four centuries, an implicit contract has been established between scientists and …

Subjects: Copyright, Education, Health Care, Internet, Knowledge Management, Search Engines

LEAKED: A New List Reveals Top Websites Meta Is Scraping of Copyrighted Content to Train Its AI

Drop Site: “Meta has scraped data from the most-trafficked domains on the internet —including news organizations, education platforms, niche forums, personal blogs, and even revenge porn sites—to train its artificial intelligence models, according to a leaked list obtained by Drop Site News. By scraping data from roughly 6 million unique websites, including 100,000 of the …

Subjects: AI, Copyright, Internet, Knowledge Management, Legal Research, Search Engines

OpenAI offers 20 million user chats in ChatGPT lawsuit. NYT wants 120 million.

Ars Technica: “OpenAI is preparing to raise what could be its final defense to stop The New York Times from digging through a spectacularly broad range of ChatGPT logs to hunt for any copyright-infringing outputs that could become the most damning evidence in the hotly watched case. In a joint letter (PDF) Thursday, both sides …

Subjects: AI, Copyright, Intellectual Property, Internet, Knowledge Management, Legal Research, Search Engines

LLRX July 2025 Issue – Articles and Columns

LLRX July 2025 Issue – Articles and Columns The Trump Administration’s Continued War Against Science, Research and Public Health – Sabrina I. Pacifici’s overview of selected articles highlights the devastating impact of the Trump administration’s dismantling of agencies across the federal government, with a focus on cancelling critical scientific and health related research grants. The …

Subjects: AI, Censorship, Civil Liberties, Climate Change, Congress, Copyright, Cybercrime, Cybersecurity, Economy, Education, Energy, Environmental Law, Financial System, Government Documents, Health Care, Internet, Knowledge Management, Legal Research, Medicine, Privacy, Search Engines, Social Media

Perplexity is using stealth, undeclared crawlers to evade website no-crawl directives

Cloudflare: “We are observing stealth crawling behavior from Perplexity, an AI-powered answer engine. Although Perplexity initially crawls from their declared user agent, when they are presented with a network block, they appear to obscure their crawling identity in an attempt to circumvent the website’s preferences. We see continued evidence that Perplexity is repeatedly modifying their …

Subjects: AI, Copyright, Intellectual Property, Internet, Knowledge Management, Search Engines

Left-leaning Third Way maps counter plan to Trump’s AI agenda

Semafor – [reg. required to read all articles] “Trump on Wednesday [July 23, 2025] unveiled his long-awaited AI Action Plan, a national strategy accompanied by a trio of executive orders meant to replace his predecessor’s safety-first directives. The new roadmap focuses on deregulation and industry growth to better compete against China. In response, left-center policy …

Subjects: AI, Congress, Copyright, E-Records, Government Documents, Intellectual Property, Internet, Knowledge Management, Legal Research

What is AI Reading – Report by Muck Rack

Muck Rack Complete Report – Snipped from Executive Summary • Citations affect responses: Simply enabling or disabling the ability for AI to search the web drastically modifies responses, indicating that the systems are truly basing their responses on the cited works. • Journalism and earned media are important drivers: More than 95% of links cited …

Subjects: AI, Copyright, E-Commerce, E-Government, Education, Government Documents, Intellectual Property, Internet, Knowledge Management, Legal Research, Search Engines, Social Media

Thank you!