Accurate, Focused Research on Law, Technology and Knowledge Discovery Since 2002

HathiTrust Research Center adds 5 billion pages

News release: “…Partnering with close to 100 research libraries from around the world, HathiTrust holds about 595 terabytes of digitized textual data — that’s about 157 miles, or 10,000 tons of text. In 2010, HathiTrust launched the HTRC to help researchers around the world accomplish tera-scale data mining and textual analysis. The HTRC is a collaborative effort among Indiana University; the University of Illinois, Urbana-Champaign (UIUC); and the University of Michigan. Until recently, the HTRC had access to less than a third of the full HathiTrust repository. That all changed this year, and now the HTRC is working with the University of Michigan to enable analysis of the entire 5 billion pages of textual data in the HathiTrust repository.  “This will be the first time that a researcher could analyze, as data, a collection that is equivalent to some of the largest research libraries in the world,” says Robert McDonald, associate dean of libraries at Indiana University. This poses a new challenge for the HTRC. Most of the texts in the HathiTrust remain under copyright, so one of the chief HTRC goals is to ensure non-consumptive research access to these protected works. This stipulation has led the HTRC to create the Secure HathiTrust Analytics Research Commons (SHARC), a secure framework for researcher access to restricted content.”

Sorry, comments are closed for this post.