Accurate, Focused Research on Law, Technology and Knowledge Discovery Since 2002

Harvard Law School Library Embarks on Expansive Digitization Project

Erik Eckholm, NYT: “Shelves of law books are an august symbol of legal practice, and no place, save the Library of Congress, can match the collection at Harvard’s Law School Library. Its trove includes nearly every state, federal, territorial and tribal judicial decision since colonial times — a priceless potential resource for everyone from legal scholars to defense lawyers trying to challenge a criminal conviction. Now, in a digital-age sacrifice intended to serve grand intentions, the Harvard librarians are slicing off the spines of all but the rarest volumes and feeding some 40 million pages through a high-speed scanner. They are taking this once unthinkable step to create a complete, searchable database of American case law that will be offered free on the Internet, allowing instant retrieval of vital records that usually must be paid for…”

Free the Law – Overview by: Adam Ziegler – October 29, 2015

Project Summary – Problem: Our common law is not freely accessible online. This lack of access to the law impairs justice and equality and stifles innovation.

Goal: Transform the official print versions of all historical U.S. court decisions into digital files made freely accessible online. Encourage and assist federal and state courts in making all prospective court decisions freely accessible online.

Scope:

  • All official reported decisions of the federal courts
  • All official reported decisions of the courts of every state
  • All territorial and pre-statehood decisions in the Harvard Law School Library collection
  • Estimated 43,000 volumes and 40MM pages

Process:

  1. Get the books from HLSL or Harvard Depository
  2. Scan the books using a high-speed scanner (~450K pages per week)
  3. Preserve the books in long-term underground storage
  4. Convert the scanned images into machine-readable text files
  5. Extract the individual cases into individual text files
  6. Redact headnotes and other editorial content
  7. Make the redacted images and text files freely accessible online

Projected Timeline:

  • 2015: Ramp up digitization production
  • 2016 (projected): digitize 25MM-30MM pp → publish CA, NY, MA, IL, TX, Federal
  • 2017 (projected): digitize remaining 10MM-15MM pp → publish everything…”

Sorry, comments are closed for this post.