Accurate, Focused Research on Law, Technology and Knowledge Discovery Since 2002

Quantitative Analysis of Culture Using Millions of Digitized Books

Quantitative Analysis of Culture Using Millions of Digitized Books, Published Online 16 December 2010, Jean-Baptiste Michel et al. Science DOI: 10.1126/science.1199644.

  • “We constructed a corpus of digitized texts containing about 4% of all books ever printed. Analysis of this corpus enables us to investigate cultural trends quantitatively. We survey the vast terrain of “culturomics”, focusing on linguistic and cultural phenomena that were reflected in the English language between 1800 and 2000. We show how this approach can provide insights about fields as diverse as lexicography, the evolution of grammar, collective memory, the adoption of technology, the pursuit of fame, censorship, and historical epidemiology. “Culturomics” extends the boundaries of rigorous quantitative inquiry to a wide array of new phenomena spanning the social sciences and the humanities…We report the creation of a corpus of 5,195,769 digitized books containing ~4% of all books ever published. Computational analysis of this corpus enables us to observe cultural trends and subject them to quantitative investigation. “Culturomics” extends the boundaries of scientific inquiry to a wide array of new phenomena. The corpus has emerged from Google’s effort to digitize books.”
  • See also Geoffrey Nunberg, Chronicle of Higher Education – Counting on Google Books
  • Sorry, comments are closed for this post.