Data manipulation within the US Federal Government, Freilich, Janet et al. The Lancet. Published July 3, 2025 DOI: 10.1016/S0140-6736(25)01249-8 – “A US Department of Veterans Affairs dataset compiling veteran health-care use in 2021 was quietly amended on March 5, 2025. A column titled gender was renamed sex, and the words were also switched in the dataset title and description. Before March 5, the dataset had not been modified since it was published in 2022. As of May 1, the dataset change log, in which modifications should be tracked, is empty. The switch from gender to sex also occurred in other public health datasets, including US Centers for Disease Control and Prevention (CDC) datasets tracking global adult tobacco consumption, stroke mortality data from 2015 to 2017, and a survey of nutrition, physical activity, and obesity. The agencies involved have not issued any statements confirming or explaining these changes, but they could be intended to comply with a Presidential directive for agencies to remove “messages that promote or otherwise inculcate gender ideology”. Public health researchers, scientists, and medical practitioners rely heavily on government datasets for research and clinical practice. Following a global trend towards an open government, the US 2019 OPEN Government Data Act empowered federal agencies to make datasets publicly available. The US Government’s main data repository now hosts hundreds of thousands of datasets. Data manipulation by the US Government, particularly when hidden, is a crisis—it makes crucial datasets untrustworthy and unusable. If the US Government secretly changes datasets for political reasons, researchers relying on the data might erroneously recommend ineffective or counterproductive interventions. Further, such changes, when discovered, reduce trust in the data that underly public health and, consequently, health interventions. This reduction in trust hinders the progress of science, medicine, and public health, and reduces individual willingness to rely on expert recommendations. It is also a crisis for international researchers who depend on US Government datasets and data infrastructure. But there are potential solutions and actions that researchers around the world can take. We gathered metadata from the US Department of Health and Human Services, CDC, and Veterans Affairs database harvest sources (metadata inventories of the agency datasets), and selected databases that were modified between Jan 20 and March 25, 2025. We excluded duplicates, datasets that had no archived copies for comparison or were otherwise unavailable, and datasets routinely updated monthly or more frequently. The final cohort included 232 datasets. We manually compared each dataset to archived versions hosted by the Internet Archive. We tracked alterations to words only, not numbers in the data. We did not track changes to the US Government websites other than those hosting the datasets. Full methodological details are in the appendix (pp 2–3).
We found that 114 (49%) of the 232 included datasets were substantially altered. Of these, the vast majority (106 datasets [93%]) had the word gender switched to sex (appendix p 2). Only 15 (13%) of the 114 altered datasets logged or otherwise indicated that the change had occurred. Alterations in 89 (78%) of the datasets were to the classification or categorisation of the data, such as column headers or stratification categories, and alterations in the remaining 25 (22%) were to descriptions of the data such as tags or narrative introductions to the dataset. The alterations span the studied period. Of the 114 datasets with substantial changes, 4 (4%) were altered between Jan 20 and Jan 31, 2025; 30 (26%) were altered between Feb 1 and Feb 28, 2025; and 82 (72%) between March 1 and March 25, 2025. In 28 (25%) of the altered datasets, the change made the data descriptions more consistent. In these cases, the word gender had been applied to data also labelled as sex (eg, a stratification category labelled gender while the underlying data column was titled sex; after the change, only the word sex remained)…”