Category «Internet»

Outlier and collapse: The enron corpus and foundation model training data

Zimmer, Z. (2026). Outlier and collapse: The enron corpus and foundation model training data. Big Data & Society, 13(1). https://doi.org/10.1177/20539517261421474 (Original work published 2026) – “The Enron Corpus is a canonical training dataset representing one of the first scale jumps in the size of natural language data for machine learning (ML) research. That corpus was …

Subjects: AI, Economy, Education, Energy, Financial System, Internet, Knowledge Management, Legal Research

Internet Archive – Search CIA World Factbook Collection

Follow-up to CIA ends publication of its popular World Factbook reference tool, See Also Internet Archive Way Back Machine – Search CIA World Factbook collection: Search 18,004 web pages using an index built from anchor text, all URL parts (file names, hosts, domains), MIME Types, language, HTTP Status codes and the full text of the …

Subjects: E-Records, Government Documents, Internet, Knowledge Management, Legal Research

The Chatbots Appear to Be Organizing

Vox – AI agents could change your life — if they don’t ruin it first. ChatGPT is boring compared to what comes next. The Atlantic [no paywall] – “The first signs of the apocalypse might look a little like Moltbook: a new social-media platform, launched last week, that is supposed to be populated exclusively by …

Subjects: AI, Economy, Financial System, Internet, Knowledge Management, Legal Research, Privacy, Search Engines, Social Media

Follow the Changes: 9 Ways Web Archives are Used in Digital Investigations

Internet Archive Blogs: “Digital journalists increasingly turn to web archives like the Wayback Machine to follow how things on the Internet break, change or disappear – from deleted posts to quietly edited pages. The web has become not only a source of information but also the subject of media investigations, prompting journalists, researchers and activists …

Subjects: E-Government, E-Records, Government Documents, Internet, Knowledge Management, Legal Research, Search Engines

Data Checkup A health check for federal data collections

The dataindex.us team is excited to launch the Data Checkup – a comprehensive framework for assessing the health of federal data collections, highlighting key dimensions of risk and presenting a clear status of data well-being. When we started dataindex.us, one of our earliest tools was a URL tracker: a simple way to monitor whether a …

Subjects: Censorship, E-Government, E-Records, Freedom of Information, Government Documents, Internet, Knowledge Management, Legal Research

Wayback Machine debuts a new plug-in designed to fix the internet’s broken links problem

TechCrunch: “The Internet Archive is a nonprofit that — as you might expect — is devoted to archiving the internet and preserving digital context for future generations. This week, the platform announced a new tool designed to expand on that mission by helping the world’s WordPress users keep their articles in peak digital health. The …

Subjects: Internet, Knowledge Management, Legal Research

Millions of books died so Claude could live

The Verge: “When ChatGPT launched, in November of 2022, it started a race that almost immediately consumed the tech industry. OpenAI didn’t invent the concept of AI, but most of the state-of-the-art technology was confined to research labs at companies and institutions around the world. Then, suddenly, it was everywhere. And better than anyone expected. …

Subjects: AI, Copyright, Education, Intellectual Property, Internet, Knowledge Management, Legal Research, Search Engines

The Cambridge Online Trust & Safety Index

“The Cambridge Online Trust & Safety Index (COTSI) tracks the dynamics of the fake SMS-verifications market across different platforms and countries. The Index aims to answer the following questions: How easy (and cheap) is it to engage in online manipulation? Is the situation with fake accounts improving or worsening over time? Which online platforms are …

Subjects: Censorship, Civil Liberties, E-Records, Economy, Financial System, Internet, Knowledge Management, Legal Research, Social Media

Donald Trump Has Built a Clicktatorship

The Atlantic [no paywall]: Even the administration’s budget proposals read like Truth Social posts. “…No one better exemplifies the clicktatorship than the president himself. Trump routinely makes policy announcements via social media. Consider when, in August, he attempted to fire the Federal Reserve Board member Lisa Cook on Truth Social. When a government lawyer was …

Subjects: E-Government, E-Records, Government Documents, Internet, Knowledge Management, Legal Research, Social Media