The Ghost Files published in Columbia Magazine, Winter 2013-14 by David J. Craig – writing about Columbia Professor Connolly’s Declassification Engine and his efforts to leverage big-data analysis to show what kinds of information the government is keeping classified.
“…Then Connelly had an idea: could he use data mining to infer what types of information were being left out of the public record? In theory, this seemed plausible, if he could compile enough materials to work with. He figured he could start by asking Columbia Libraries to give him special access to several commercial databases that the University licenses from academic publishers and which contain federal records. He could then download a wealth of material from government websites. Maybe he could even gather up documents that fellow scholars, journalists, and citizens had acquired directly from the government under the Freedom of Information Act (FOIA). No one had ever tried to analyze the entire corpus of government records as one big database before. The promise of data mining now made it seem like a worthwhile endeavor to Connelly. He thought that if he were to recruit an interdisciplinary team of data analysts and fellow historians, he might create the first system for highlighting gaps in the National Archives. Perhaps this would even shame the government into releasing more classified materials…Connelly would soon cast a new light on why the US government was slow in releasing its secrets. In doing so, he would thrust himself into a debate that had previously been taking place behind closed doors — a debate about whether the free flow of information and national security are on a collision course.”