Library Journal: “AI companies are offering some libraries funding for digitization projects, but archives and special collections are working through how to manage projects responsibly. “Imagine a world where you know things but cannot say where you learned them,” begins “Memory Without Origin,” a paper published in April by University of Virginia (UVA) Dean of Libraries and University Librarian Leo S. Lo. This isn’t a hypothetical question, Lo notes, it’s a predictable consequence if libraries allow generative artificial intelligence (AI) to ingest archival materials as training data without requiring provenance conditions. And libraries, which could always use funding for projects involving digitization, special collections, and archives, are being approached by AI companies with deep pockets. “They’ve been approaching a lot of larger research libraries, including Oxford and many more,” Lo tells LJ. (Oxford’s Bodleian Libraries began a digitization pilot project funded by ChatGPT maker OpenAI last year.) “Usually the offer is: they will pay you to digitize materials—which we want, because we want to make them more accessible—and in return, depending on the deal…they would like to have the data to train their AI models.”
These partnerships can benefit both parties, but for libraries, the consequences of getting these arrangements wrong “are more permanent than anything the profession has previously encountered,” Lo writes. “Once archival materials are absorbed into foundation model weights, no subsequent institutional action can remove them from the model.” If proper care isn’t taken, that information becomes unmoored from its former context within an archive…”