Accurate, Focused Research on Law, Technology and Knowledge Discovery Since 2002

Perma: Scoping and Addressing the Problem of Link and Reference Rot in Legal Citations

Harvard Law Review Essay by Jonathan Zittrain, Kendra Albert and Lawrence Lessig: “Works of scholarship have long cited primary sources or academic works to provide sources for facts, to incorporate previous scholarship, and to bolster arguments.  The ideal citation connects an interested reader to what the author references, making it easy to track down, verify, and learn more from the indicated sources.  In principle, as cited sources move to the Web, this linking should become easier.  Rather than requiring a reader to travel to a library to follow the sources cited by an author, the reader should be able to retrieve the cited material immediately with a single click.  But again, only in principle.  The link, a URL, points to a resource hosted by a third party.  That resource will only survive so as long as the third party preserves it.  And as websites evolve, not all third parties will have a sufficient interest in preserving the links that provide backwards compatibility to those who relied upon those links.  The author of the cited source may decide the argument in the source was mistaken and take it down.  The website owner may decide to abandon one mode of organizing material for another.  Or the organization providing the source material may change its views and “update” the original source to reflect its evolving views.  In each case, the citing paper is vulnerable to footnotes that no longer support its claims.  This vulnerability threatens the integrity of the resulting scholarship. This problem does not exist for printed sources, or at least not in the same way.  Print sources can be kept indefinitely by libraries or archives, assuming space and other determinations allow.  The ability to update those original print sources is, for these purposes, happily difficult.  Tracking down every original copy of an edition of a printedNew York Times and changing a story on page A4 is the stuff of Orwell’s imagination, not real-world practicality.  But to do the same thing with an online edition is trivial.  As newspapers, government agencies and other non-academic sources move to primarily digital publication, law review articles increasingly reference online materials, sometimes in lieu of, or in addition to, a print source.  When online material does not have a formal paper counterpart such as a published book or journal article, there are few repositories that keep copies of the linked material from citations.  Instead, linked material remains in the custody of its single host, rather than being distributed among libraries or readers.  Because of this, materials at links frequently (1) become inaccessible or (2) change, a phenomenon known as “link rot” and “reference rot,” respectively.  Link rot refers to the URL no longer serving up any content at all.  Reference rot, an even larger phenomenon, happens when a link still works but the information referenced by the citation is no longer present, or has changed. Building on previous studies of link rot, we have reviewed links published within three legal journals — the Harvard Law Review (HLR), the Harvard Journal of Law and Technology (JOLT) and the Harvard Human Rights Journal (HRJ) — as well as the links contained across all published United States Supreme Court opinions.  We exploited the unique citation style of law reviews and court opinions, including the extensive cite-checking process, which meant that in almost all cases, we were able to determine whether the original information was present.  Thus, our study was able to validate previous findings of link rot in law review and Supreme Court citations, as well as provide an estimate of how many said citations were affected by reference rot.  We documented a serious problem of reference rot: more than 70% of the URLs within the above mentioned journals, and 50% of the URLs within U.S. Supreme Court opinions suffer reference rot — meaning, again, that they do not produce the information originally cited.  Given both of these problems, in this paper we propose a solution for authors and editors of new scholarship that will secure the long-term integrity of cited sources by involving libraries in a distributed, long-term preservation of link contents.  Perma.cc, developed by the Harvard Library Innovation Lab, is a caching solution to be used by authors and journal editors in order to integrate the preservation of cited material with the act of citation.  Upon direction from a paper author or editor, Perma will retrieve and save the contents of a webpage, and return a permanent link.  When the work is published, the author can include that permanent citation in addition to a citation to the original URL, or just the permanent link, ensuring that even if the original is no longer available because the site goes down or changes, the cache is preserved and available.  Other services have offered permanent citations before. But those services themselves become vulnerabilities within a citation system if their own long-term viability is not assured.  Perma mitigates this vulnerability by distributing the Perma caches, architecture, and governance structure to libraries across the world.  Thus, so long as any library or successor within the system survives, the links within the Perma architecture will remain.”

Leave a reply