Business Insider – Its contractor left it wide open. “An internal spreadsheet obtained by Business Insider shows which websites Surge AI gig workers were told to mine — and which to avoid — while fine-tuning Anthropic’s AI to make it sound more “helpful, honest, and harmless.” The spreadsheet allows sources like Bloomberg, Harvard University, and the New England Journal of Medicine while blacklisting others like The New York Times and Reddit. Anthropic says it wasn’t aware of the spreadsheet and said it was created by a third-party vendor, the data-labeling startup Surge AI, which declined to comment on this point. “This document was created by a third-party vendor without our involvement,” an Anthropic spokesperson said. “We were unaware of its existence until today and cannot validate the contents of the specific document since we had no role in its creation.” Frontier AI companies mine the internet for content and often work with startups with thousands of human contractors, like Surge, to refine their AI models. In this case, project documents show Surge worked to make Anthropic’s AI sound more human, avoid “offensive” statements, and cite documents more accurately. Many of the whitelisted sources copyright or otherwise restrict their content. The Mayo Clinic, Cornell University, and Morningstar, whose main websites were all listed as “sites you can use,” told BI they don’t have any agreements with Anthropic to use this data for training AI models. Surge left a trove of materials detailing its work for Anthropic, including the spreadsheet, accessible to anyone with the link on Google Drive. Surge locked down the documents shortly after BI reached out for comment. “We take data security seriously, and documents are restricted by project and access level where possible,” a Surge spokesperson said. “We are looking closely into the matter to ensure all materials are protected.”…