I built an explorer of 25+ years of New York Times coverage – 1.5B words and 2.2M articles

Below the Fold: “I used New York Times API/archive data to build an explorer of the paper’s coverage over the last 25+ years: 1.5 billion words across 2.2 million articles by about 26,000 reporters. You can use it to look at:

  • which reporters covered which beats
  • who shared bylines with whom
  • article frequency and length
  • headline-word frequency over time
  • section comparisons
  • U.S. and global coverage patterns

A few things that jump out at me:

  • to the surprise of no one, Maggie Haberman dominates recent byline counts
  • Trump dominates headlines compared to other recent presidents, even when OOO
  • Iowa surges every four years
  • China coverage peaked around 2014
  • India looks relatively under-covered on a per-capita basis

I began this in Python a couple of years ago during the Lede Program at Columbia J School but revived it recently with Claude Code for a lot of the grunt work. Any errors are mine. Let me know what you think! Explorer: https://tedalcorn.github.io/nyt/

Posted in: Internet, Knowledge Management