AI Scraping Bots Are Breaking Open Libraries, Archives, and Museums

404 Media: “This is a moment where that community feels collectively under threat and isn’t sure what the process is for solving the problem.” “Popular Information AI bots that scrape the internet for training data are hammering the servers of libraries, archives, museums, and galleries, and are in some cases knocking their collections offline, according to a new survey published today. While the impact of AI bots on open collections has been reported anecdotally, the survey is the first attempt at measuring the problem, which in the worst cases can make valuable, public resources unavailable to humans because the servers they’re hosted on are being swamped by bots scraping the internet for AI training data. AI bots that scrape the internet for training data are hammering the servers of libraries, archives, museums, and galleries, and are in some cases knocking their collections offline, according to a new survey published today. While the impact of AI bots on open collections has been reported anecdotally, the survey is the first attempt at measuring the problem, which in the worst cases can make valuable, public resources unavailable to humans because the servers they’re hosted on are being swamped by bots scraping the internet for AI training data.  “I’m confident in saying that this problem is widespread, and there are a lot of people and institutions who are worried about it and trying to think about what it means for the sustainability of these resources,” the author of the report, Michael Weinberg, told me. “A lot of people have invested a lot of time not only in making these resources available online, but building the community around institutions that do it. And this is a moment where that community feels collectively under threat and isn’t sure what the process is for solving the problem.”

The report, titled “Are AI Bots Knocking Cultural Heritage Offline?” was written by Weinberg of the GLAM-E Lab, a joint initiative between the Centre for Science, Culture and the Law at the University of Exeter and the Engelberg Center on Innovation Law & Policy at NYU Law, which works with smaller cultural institutions and community organizations to build open access capacity and expertise. GLAM is an acronym for galleries, libraries, archives, and museums. The report is based on a survey of 43 institutions with open online resources and collections in Europe, North America, and Oceania. Respondents also shared data and analytics, and some followed up with individual interviews. The data is anonymized so institutions could share information more freely, and to prevent AI bot operators from undermining their countermeasures…”

Posted in: AI, Copyright, Digital Rights, Education, Internet, Knowledge Management, Legal Research, Libraries