Cloudflare: “A new category, AI crawlers, has emerged in recent years. These bots collect data from across the web to train AI models, improving tools and experiences, but also raising issues around content rights, unauthorized use, and infrastructure overload. We aimed to confirm the growth of both search and AI crawlers, examine specific AI crawlers, and understand broader crawler usage.”…
Let’s start with an AI-only crawler perspective that we currently have on Cloudflare Radar, focused only on crawlers advertised as AI-related. To identify them, we’re using here a list derived from an open-source project that helps website owners manage and control access to AI crawlers — especially those used to train large language models (LLMs). It also provides guidance on what to include in
robots.txtfiles (more on that below). The data shown below is based on matching those crawler names with user-agent strings in HTTP requests. (Further details, including one exception, about this method can be found at the end of the blog post.) The AI crawler landscape saw a significant shift between May 2024 and May 2025, withGPTBot(from OpenAI) emerging as the dominant force, surging from 5% to 30% share, andMeta-ExternalAgent(from Meta) making a strong new entry at 19%. This growth came at the expense of former leaderBytespider, which plummeted from 42% to 7%, as well as other AI crawlers likeClaudeBotandAmazonbot, which also saw declines. Our data clearly indicates a reordering of top AI crawlers, highlighting the increasing prominence of OpenAI and Meta in this category.