Data is Beautiful – JoonSimJoon: “When researchers write “recent studies show…” – how recent is recent, really? I scraped 749,853 references from 19,108 papers across 200 academic fields using OpenAlex data to find out.
TL;DR:
- Average “recent” = about 5 years
- Virology/Pandemic research: 2 years (half their citations are from the last 2 years!)
- Philosophy/History: 7-10 years
- Humanities fields: 50%+ of their “recent” citations are 10+ years old
The most interesting findings:
- Virology is FAST – 52.8% of citations are ≤2 years old. Makes sense given COVID.
- Philology lives in the past – 51.6% of citations are ≥10 years old. When you’re studying ancient texts, “recent” is relative.
- Same-year citations – 4.3% of all references are from papers published the same year. Preprints are changing the game.
- Maximum lag found: 50 years in a Natural Language Processing paper. Someone cited a 1970s paper as “recent” lol.
Methodology:
- Searched for papers with “recent” in abstract (2020-2024)
- Extracted all their references
- Calculated citation lag = citing_year – cited_year
- Used OpenAlex API (free and open!
- Inspired by the BMJ paper “How recent is recent?” which did this for medical fields only.
- Full code and data: https://github.com/JoonSimJoon/How-current-is-recent.
- Tools: Python, OpenAlex API, geopandas for maps