Fabricated citations: an audit across 2-5 million biomedical papers

Topaz M, Roguin N, Gupta P et al. Fabricated citations: an audit across 2·5 million biomedical papers. Volume 407, Issue 10541 p1779-1781May 09, 2026 “Scientific literature depends on the integrity of its references. Each reference implicitly asserts that a verifiable source exists and supports the claims being made. When references point to non-existent studies, readers, reviewers, and policy makers are unable to evaluate the evidence. Fabricated references (references whose claimed titles correspond to no existing publication) can arise from paper mill activity, intentional misconduct, or uncritical use of artificial intelligence (AI) writing tools. Large language models (LLMs) generate plausible sounding but fictitious references, a well documented failure mode; previous studies estimate that 30–69% of LLM generated references in biomedical contexts are fabricated. These references are often correctly formatted, attributed to real researchers, and bear plausible publication dates, making them difficult to detect by conventional peer review. To our knowledge, no systematic audit of reference integrity across the biomedical literature has been conducted until now. We present findings from a reference-integrity audit of 2·5 million biomedical papers spanning 3 years, showing that fabricated references are embedded in the peer-reviewed literature at scale, and that the rate of fabrication is accelerating. We developed an automated reference verification system scanning PubMed Central’s Open Access subset from Jan 1, 2023, to Feb 18, 2026: 2 471 758 papers and 125 615 773 structured references. We extracted references from full-text extensible markup language, retaining those with a PubMed identifier (PMID). Of 125·6 million references, 97·1 million (77%) carried a PMID and were verified; the remaining 23%, predominantly non-indexed references to websites, books, and grey literature, were excluded. For each verified reference, we retrieved the bibliographical record for the claimed identifier from PubMed and Crossref and compared it with the citing paper’s claimed metadata with the use of text-similarity scoring, and mismatches were flagged. Flagged references underwent sequential filters to minimise false positives: automated pattern detection removed parsing artefacts, and an LLM (Claude 3.5 Haiku; Anthropic, San Francisco, CA, USA) screened remaining candidates to distinguish genuine fabrications from formatting discrepancies such as informally abbreviated titles. For example, a reference listed as Depression and anxiety in young adults with ID corresponds to the real indexed title Depression and anxiety symptoms during the transition to early adulthood for people with intellectual disabilities and is probably a reference error, not a fabrication. The model was applied zero-shot without fine-tuning or modification of model weights. References passing all filters were verified against PubMed (approximately 37 million records), Crossref (more than 160 million digital object identifiers), OpenAlex (more than 250 million scholarly works), and Google Scholar (which indexes journals, preprints, conference proceedings, theses, and grey literature). A reference not found in any database was classified as a fabricated reference; one found but linked to an incorrect identifier was a reference error (appendix p 2–4). Precision of our automated reference verification system was 91% (Fleiss’ κ=0·71, indicating moderate agreement in about seven of every ten cases), measured in a 500-entry masked validation with three independent reviewers; this design estimates precision but not recall. Among 97·1 million verified references, we identified 4046 fabricated references across 2810 papers (illustrative examples are shown in the appendix p 5–6). In 2023, approximately one in 2828 papers contained at least one fabricated reference. By 2025, this had risen to one in 458 and in the first 7 weeks of 2026, one in 277 papers had at least one fabricated reference. The fabrication rate increased more than 12 times, from approximately four per 10 000 papers in 2023, to 51·3 per 10 000 papers in the fourth quarter of 2025, reaching 56·9 per 10 000 papers in early 2026 (figure)…”
Posted in: AI, Education, Health Care, Knowledge Management, Medicine