Accurate, Focused Research on Law, Technology and Knowledge Discovery Since 2002

Caselaw Access Project

“The Caselaw Access Project (“CAP”) expands public access to U.S. law. Our goal is to make all published U.S. court decisions freely available to the public online, in a consistent format, digitized from the collection of the Harvard Law School Library. We created CAP’s initial collection by digitizing roughly 40 million pages of court decisions contained in roughly 40,000 bound volumes owned by the Harvard Law School Library. The Harvard Law School Collection includes volumes published through 2018. The Harvard Law School Collection was digitized on site at Langdell Hall. Members of our team created metadata for each volume, including a unique barcode, reporter name, title, jurisdiction, publication date and other volume-level information. We then used a high-speed scanner to produce JP2 and TIF images of every page. A vendor then used OCR to extract the text of every case, creating case-level XML files. Key metadata fields, like case name, citation, court and decision date, were corrected for accuracy, while the text of each case was left as raw OCR output. In addition, for cases from volumes not yet in the public domain, our vendor redacted any headnotes. The Harvard Law School Collection does not include:

  • Cases not designated as officially published, such as most lower court decisions.
  • Non-published trial documents such as party filings, orders, and exhibits.
  • Parallel versions of cases from regional reporters, unless those cases were designated by a court as official.
  • Cases officially published in digital form, such as recent cases from Illinois, Arkansas, New Mexico, and North Carolina.
  • Copyrighted material such as headnotes, for cases still under copyright…”
  • See also The Caselaw Access Project — Then, Now, Tomorrow

Sorry, comments are closed for this post.