Accurate, Focused Research on Law, Technology and Knowledge Discovery Since 2002

Wikipedia search-by-vibes through millions of pages offline

“What is This? This is a browser-based search engine for Wikipedia, where you can search for “the reddish tall trees on the san francisco coast” and find results like “Sequoia sempervirens” (a name of a redwood tree). The browser downloads the database, and search happens offline. To download two million Wikipedia pages with their titles takes roughly 100MB and under 50 milliseconds to see the final results. This uses sentence transformers to embed documents, product quantization to compress embeddings, pq.js to run distance computation in the browser, and transformers.js to run sentence transformers in the browser for queries. Is this good? Yes. Real-time search over millions of documents is happening in real-time completely offline. Results stream back every 10ms on a mobile device, and search results update gradually as the database is sequentially scanned.”

Sorry, comments are closed for this post.