Accurate, Focused Research on Law, Technology and Knowledge Discovery Since 2002

Artificial Intelligence Patent Dataset

“To assist researchers and policymakers focusing on the determinants and impacts of artificial intelligence (AI) invention, OCE released two data files, collectively called the Artificial Intelligence Patent Dataset (AIPD). The first data file identifies United States (U.S.) patents issued between 1976 and 2020 and pre-grant publications (PGPubs) published through 2020 that contain one or more of several AI technology components (including machine learning, natural language processing, computer vision, speech, knowledge processing, AI hardware, evolutionary computation, and planning and control). OCE generated this data file using a machine learning (ML) approach that analyzed patent text and citations to identify AI in U.S. patent documents (Abood and Feltenberger 2018; Toole et al. 2020). OCE’s approach is based on the methodology of Abood and Feltenberger (2018), but also includes an analysis of patent claims to better identify AI contained in the technical and legal scope of the invention. The second data file contains the patent documents used to train the ML models.

  • A working paper describing the dataset is available and can be cited as Giczy, A., Pairolero, N., and Toole, A. 2021. Identifying artificial intelligence (AI) invention: A novel AI patent dataset. USPTO Economic Working Paper Series No. 2021-2. Available at SSRN:
  • This effort was made possible through cross business unit collaboration among OCE, the Office of Policy and International Affairs, the Patents Business Unit, and the Office of the Chief Information Officer. The AIPD was used in the USPTO report “Inventing AI: Tracing the diffusion of artificial intelligence with U.S. patents.”

Sorry, comments are closed for this post.