Spicy Regs

Abigail Haddad: “Want to search public comments from regulations.gov and search across multiple dockets? Eugene Kim put something together, and it’s cool. đź”— Open it in colab and run it there: https://colab.research.google.com/github/civictechdc/spicy-regs/blob/main/notebooks/search_capabilities.ipynb#scrollTo=QSAA6t-RyqDH  This is made possible by Mirrulations, the project from Ben Coleman at Moravian University that pulls in all of the comments from regulations.gov and puts them in an S3 bucket: https://lnkd.in/eWbPc96u Search Capabilities — No Server Required. Spicy Regs publishes all data as public files on Cloudflare R2. That means you can do mirrulations-search-style queries (and more) from any Python shell without running a backend. This notebook walks through the three layers of search available today:

Layer Use when Scope Latency
1. Prebuilt docket index (docket_search.json.gz) You want the same fast metadata search the site uses Docket title + abstract (346K) One ~10 MB download, then in-memory
2. DuckDB over dockets.parquet You need SQL flexibility on docket metadata Docket title + abstract + fields <1s per query
3. DuckDB over partitioned comments/ Full-text over comment bodies (the mirrulations-search equivalent) 24M+ comments Seconds if you narrow by agency/docket

Everything below hits public URLs. No auth, no local downloads of multi-GB files…

Posted in: E-Government, E-Records, Government Documents, Knowledge Management, Legal Research, Search Engines