Accurate, Focused Research on Law, Technology and Knowledge Discovery Since 2002

Paper – Characterizing the Google Books Corpus

Characterizing the Google Books Corpus: Strong Limits to Inferences of Socio-Cultural and Linguistic Evolution. Eitan Adam Pechenick, Christopher M. Danforth, Peter Sheridan Dodds. PLOS ONE – Published: October 7, 2015. DOI: 10.1371/journal.pone.0137041.

“It is tempting to treat frequency trends from the Google Books data sets as indicators of the “true” popularity of various words and phrases. Doing so allows us to draw quantitatively strong conclusions about the evolution of cultural perception of a given topic, such as time or gender. However, the Google Books corpus suffers from a number of limitations which make it an obscure mask of cultural popularity. A primary issue is that the corpus is in effect a library, containing one of each book. A single, prolific author is thereby able to noticeably insert new phrases into the Google Books lexicon, whether the author is widely read or not. With this understood, the Google Books corpus remains an important data set to be considered more lexicon-like than text-like. Here, we show that a distinct problematic feature arises from the inclusion of scientific texts, which have become an increasingly substantive portion of the corpus throughout the 1900s. The result is a surge of phrases typical to academic articles but less common in general, such as references to time in the form of citations. We use information theoretic methods to highlight these dynamics by examining and comparing major contributions via a divergence measure of English data sets between decades in the period 1800–2000. We find that only the English Fiction data set from the second version of the corpus is not heavily affected by professional texts. Overall, our findings call into question the vast majority of existing claims drawn from the Google Books corpus, and point to the need to fully characterize the dynamics of the corpus before using these data sets to draw broad conclusions about cultural and linguistic evolution.”

Public Collection – public art and literacy project

“The Public Collection is a public art and literacy project developed by Rachel M. Simon to improve literacy, foster a deeper appreciation of the arts, and raise awareness for education and social justice in our community. Through a curated process, Indiana-based artists were commissioned to design unique book share stations or lending libraries that are installed inContinue Reading

Quality of Death Index 2015 – Ranking palliative care across the world

The Economist: “The UK ranks first in the 2015 Quality of Death Index, a measure of the quality of palliative care in 80 countries around the world released today by The Economist Intelligence Unit (EIU). Its ranking is due to comprehensive national policies, the extensive integration of palliative care into the National Health Service, aContinue Reading

Constitutional Bad Faith

Pozen, David, Constitutional Bad Faith (October 12, 2015). 129 Harvard Law Review (forthcoming 2016). Available for download at SSRN: “The concepts of good faith and bad faith play a central role in many areas of private law and international law. Typically associated with honesty, loyalty, and fair dealing, good faith is said to supplyContinue Reading launches user privacy initiative blog: “Our new Privacy Manager puts privacy choices at your fingertips Your privacy is serious business. That’s why we always make sure we have important safeguards in place to protect the information you provide when you visit We’ve now introduced a tool that lets you easily control some of the information we mayContinue Reading

Paper – Addressing the Empathy Deficit

Addressing the Empathy Deficit: Beliefs about the Malleability of Empathy Predict Effortful Responses when Empathy is Challenging – Karina Schumann, Jamil Zaki, and Carol S. Dweck Stanford University. “Empathy is often thought to occur automatically. Yet, empathy frequently breaks down when it is difficult or distressing to relate to people in need, suggesting that empathy isContinue Reading

NARA Digitization Priorities based on user feedback

“A few weeks ago, we asked the public for suggestions and feedback about NARA’s digitization priorities to help us develop an agency-wide priority list.  This list will guide the work of the digitization program over the next couple of years. After putting out that call, responses flooded into us with comments here on NARAtions, emailsContinue Reading

Supreme Court to Highlight Revisions in Its Opinions

Supreme Court What’s New – “Beginning with the October Term 2015, postrelease edits to slip opinions on the Court’s website will be highlighted and the date they occur will be noted. The date of any revision will be listed in a new “Revised” column on the charts of Opinions, In-Chambers Opinions, and Opinions Related toContinue Reading

New Self-Guided Curriculum for Digitization

DPLA: “Through the Public Library Partnerships Project (PLPP), DPLA has been working with existing DPLA Service Hubs to provide digital skills training for public librarians and connect them sustainably with state and regional resources for digitizing, describing, and exhibiting their cultural heritage content. During the project, DPLA collaborated with trainers at Digital Commonwealth, Digital Library of Georgia, MinnesotaContinue Reading

Mother Jones investigates the work of Threat Assessment Professionals

Inside the Race to Stop the Next Shooter: “…As gun rampages have increased, so have security efforts at public venues of all kinds, and threat assessment teams can now be found everywhere from school districts and college campuses to corporate headquarters and theme parks. Behind the scenes, the federal government has ramped up its threatContinue Reading

Search over 30,000,000 Historical Newspaper Pages from the USA, Canada

This site – Old Fulton New York Post Cards– reminds me of the “good old days” when the web was new and so were many of us! Take a look and enjoy – content is historical and current and there alternative ways to choose to create search queries. Ah, just grand.  

Librarian shares what profession can learn from Buzzfeed

Christina Manzo, Boston Public Library Volume 1, Issue 3, 2015. DOI: 5 Lessons Library Websites Can Learn from Buzzfeed – “Since its 2006 launch, Buzzfeed has become an Internet institution by recognizing and capitalizing on the insatiable life-cycle of viral media. The idea behind the website is relatively simple: bring together trending content (e.g.,Continue Reading