CJR: “Researchers routinely rely on websites like data.census.gov, but this month, the top of the site displays a banner that reads: “Due to the lapse of federal funding, this website is not being updated.” Dozens of reports—from critical spending data to disease surveillance—are not updated or completely dark due to the ongoing government shutdown. Earlier this year, President Trump and Elon Musk’s funding cuts also led to the removal of publicly available datasets. Some of the datasets were restored after legal challenges, but others were not, leaving some researchers wondering what to do. For some, the answer lies in turning to news. At the Tow Center, we regularly rely on news data to understand how “pink slime” networks function or local news ecosystems evolve over time. Last week, at the New(s) Knowledge Symposium—a gathering of researchers, journalists, and technologists hosted by the Media Ecosystems Analysis Group—I found that many other researchers have been monitoring the news for data, often in unexpected ways.
- Maia Majumder, a computational epidemiologist at Boston Children’s Hospital, said that aggregated local news reports were critical for supplementing and fact-checking official sources in disease outbreaks. Because news is hyperlocal, Majumder told me in an interview, it is often more helpful for examining vaccination rates or spread of disease in a community compared with official sources that often only come at the state or national level. And if official data goes offline, news becomes even more critical. “News media help fill that gap by acting as our eyes and ears in the community,” Majumder said. “This is a role the media plays even when the government isn’t shut down, because an open government doesn’t necessarily mean a transparent one.” She previously worked at HealthMap, a platform relying partially on news to track disease outbreaks.
- Bia Carneiro, a research team leader at the Alliance of Bioversity International and CIAT, gave a talk at the conference on using news to monitor food scarcity and famine in the Global South. She described the work as “nowcasting,” or monitoring for early warning signals of various events. In a 2024 paper, Carneiro, along with other researchers, searched for signs of food insecurity using news as a complement to other data, like social media and Google trends. She now applies that work to a famine early warning network called FEWS NET and uses it in other research around migration and climate. I spoke to Carneiro after the conference, where she said news data provides a helpful signal. “Social media is really noisy. With news, it’s more topical, and it’s a bit cleaner,” Carneiro said. “It’s easier to pull out the relevant information that we want.” Specifically, local news from various countries gave her an advantage over using larger and more Eurocentric publications.
- This type of work depends on news aggregators like Media Cloud, an open-source platform that is maintained by the Media Ecosystems Analysis Group. Media Cloud crawls thousands of news websites, identifies if they have RSS feeds, and stores links to the sites’ articles—allowing researchers like Majumder and Carneiro to query the data and find trends.
- Processing news articles into data isn’t simple. Some publishers, for example—local news outlets in particular—don’t provide RSS feeds. At Tow, we’re developing a tool that we are calling “Scraper Factories.” It’s a Python package that supplements platforms like Media Cloud and leverages large language models to quickly write Web scrapers. We plan to present it this December at the Computation + Journalism Symposium in Miami and hope it can help researchers in cases where a website may not have an RSS feed.
- There are also commercial players in this space, like NewsCatcher and NewsData.io, that aggregate news feeds and charge for access to the data…”