Accurate, Focused Research on Law, Technology and Knowledge Discovery Since 2002

Investigating With Databases: Verifying Data Quality

“The Verification Handbook for Investigative Reporting is a new guide to online search and research techniques to using user-generated content and open source information in investigations. Published by the European Journalism Centre, a GIJN member based in the Netherlands, the manual consists of ten chapters and is available for free download. We’re pleased to reprint below chapter 5, by investigative journalist Giannina Segnini.  


Never before have journalists had so much access to information. More than three exabytes of data — equivalent to 750 million DVDs — are created every day, and that number duplicates every 40 months. Global data production is today being measured in yottabytes. (One yottabyte is equivalent to 250 trillion DVDs of data.) There are already discussions underway about the new measurement needed once we surpass the yottabyte. The rise in the volume and speed of data production might be overwhelming for many journalists, many of whom are not used to using large amounts of data for research and storytelling. But the urgency and eagerness to make use of data, and the technology available to process it, should not distract us from our underlying quest for accuracy. To fully capture the value of data, we must be able to distinguish between questionable and quality information, and be able to find real stories amid all of the noise. One important lesson I’ve learned from two decades of using data for investigations is that data lies — just as much as people, or even more so. Data, after all, is often created and maintained by people. Data is meant to be a representation of the reality of a particular moment of time. So, how do we verify if a data set corresponds to reality? Two key verification tasks need to be performed during a data-driven investigation: An initial evaluation must occur immediately after getting the data; and findings must be verified at the end of the investigation or analysis phase.”

Sorry, comments are closed for this post.