Should Hosts Block AI Bots or Let Them Crawl? Turns Out It’s Not So Simple

Writer: Jordan Sprogis

Jordan Sprogis, Contributing Expert

Jordan Sprogis is a creative writer and tech researcher who has been working on online content for the better part of a decade. She holds a bachelor's degree in professional writing from Western Connecticut State University and has devoted much of her career to crafting content for various web verticals, including CyberSpyder and The Echo. Since joining HostingAdvice, Jordan has combined her storytelling ability with her fascination for advancements in technology to pen over 500 articles geared toward industry pros and newcomers alike.

Editor: Lillian Castro

Lillian Castro, Senior Editor

Lillian Castro brings more than 30 years of editing and journalism experience to our team. She has written and edited for major news organizations, including The Atlanta Journal-Constitution and the New York Times, and she previously served as an adjunct instructor at the University of Florida. Today, she edits HostingAdvice content for clarity, accuracy, and reader engagement.

Reviewer: Cristian Lopez

Cristian Lopez, News Manager

Cristian Lopez uses his Business Marketing background from the University of Illinois at Chicago to create comfortable environments for customers, clients, and colleagues to share their thoughts and ideas openly. From interviewing tech leaders to conducting UX market research projects, Cristian knows the importance of storytelling — a key variable for innovation and inspiration. His goal at HostingAdvice is to wow readers on the ever-evolving nature of the tech industry and bring his audience the most reliable and exciting content on all things hosting.

Follow the HostingAdvice team for a daily dose of tech news, trending IT discussions, and interviews with the web's most innovative technologists.
Follow Us:
1k
1k

When Cloudflare announced it was going to begin blocking bot crawlers by default, it raised a new dilemma for hosting providers: Should they follow suit or not?

Photo of Daphne Monro speaking at a Hosting.com event
Daphne Monro from Hosting.com, at the web company’s recent Bangladesh launch event.

“That caused a lot of concerns from customers wondering what’s the right thing to do,” said Daphne Monro, head of website and content at Hosting.com.

It’s a double-edged sword. The very traffic that helps clients surface in AI overviews and GPT conversations is the same traffic that’s costing providers (and sometimes their clients) an arm and a leg.

Content will likely never be outdated, but discoverability — today’s buzzword for the quality of how your clients’ sites are found — is absolutely dictated by how cleanly and quickly agentic AI can understand it.

At the same time, hosts are debating whether enabling AI bot controls actually hurts search traffic. It may save on performance and cost, but it may also mean sacrificing clients’ sites visibility.

Option #1: Block Bots and Stop Straining Your Bandwidth

Since crawlers now account for around 20% of verified bot traffic, people from all over are reporting millions of automated requests in short periods — in some cases, totally eclipsing human visitors altogether.

AI crawlers can consume enough bandwidth and CPU to skew analytics. Vercel itself logged more than 500,000 automated requests in a single month.

Sudden CPU spikes (like the 300% surge shown here) are often linked to bursts of automated crawler activity. Source: GitHub via InMotion Hosting

On shared infra, that’s the stuff of nightmares.

Monro argues the debate isn’t really about content at all. It’s about infrastructure and the kinds of choices providers are willing to make.

“Your hosting infrastructure directly affects how these agents and bots and crawlers are engaging with your site and how quickly they can access the information,” she said.

Blocking bots can pretty much guarantee stabilized performance and cloud costs, which is ideal for shared environments. That’s all good until clients stop showing up in AI search results.

Option #2: Allow Bots and Feed the Machines What They Need

Being crawlable really isn’t optional anymore. If you want your clients’ sites to be seen, of course.

“We highly recommend ensuring that your content can be crawled if being visible to a wider audience is where you’re trying to be — which most business owners are,” Monro emphasized.

Many internet users have already been relying on LLM interfaces for searches and answers instead of clicking through search results.

A few years ago, Google executive Prabhakar Raghavan testified that nearly 40% of young users turn to TikTok or Instagram for search-related queries (like restaurant discovery) instead of Google Search or Maps. And Gartner projects that by the end of this year, traditional search engine volume will drop 25% as users move to AI chatbots and virtual agents.

Projected decline in traditional search engine volume from 2022 to 2026 as users rely on NLP agents/search. Source: Leadleader and Gartner

Regardless of whether that prediction will be 100% accurate, there’s no argument that users are relying on asking systems for answers instead of clicking through results.

And that means those agents have to be able to find the sites that have the information being requested.

“Structure your content by making sure that you have the right schema, product feeds, APIs, and accessibility, so that your content can be read by these bots, and served up correctly,” Monro added.

Option #3: Control Bots and Protect Your Infrastructure

The web hosting industry seems to love believing every new change in the market or trends is unprecedented. But it rarely is. Even with how technologically advanced we’ve become, we’ve actually been here before.

The early ‘90s web revolved around uptime, which ultimately set the long-lived baseline for being “web-ready” for years to come. By the mid-2000s, getting your site found became a game of algorithm and a better chance of you mastering SEO tactics like backlinking.

Nowadays, hosts are expected to do so much more than provide the infrastructure, but also act as partners and advisors. And when new technologies come about, clients expect their hosts to be there, waiting with open arms.

Monitoring crawl statistics can be helpful, especially with Google Search Console, which has an entire Crawl Stats report that logs how many requests Google made to your site over time, how often, and how your server responded. Googlebot, for example, adjusts its crawl rate based on the server.

Plus, schema markup could result in a 20-30% higher click-through rates. One test, in particular, says that adding product schema coincided with 30% surges in organic traffic for eCommerce.

This is a great example of a crawlable website: clearly structured content, product schema, ratings markup, and metadata that AI agents can sort through. Source: Timmerman Group

But in the specific case of AI agents, that may look a bit like policy. No, not to the point of bureaucracy — but internal policies.

“Bots don’t browse, so you have to think of the things they can execute,” Monro said.

That may start with the basics, like refining robots.txt and experimenting with LLMs.txt to offer cleaner summaries for AI agents.

Though new, Cloudflare released a pay-per-crawl for high-volume access, which is available to anyone on Cloudflare’s network. Or you could opt for stronger WAF rules that distinguish verified crawlers from spambots and rate limiting to prevent burst traffic for shared environments.

Take a look at how many total requests versus referral visits this audit shows. Source: Cloudflare

“We do recommend always ensuring that you have that extra buffer room. If you’re better optimized, you are more likely to be shown up in search,” Monro told us.

The flip side of the coin is that machines, like humans, have limits to their patience. If your protocols force them to jump through hoops, then the crawler will likely bypass the site altogether.

“They [have to] engage with your site fast enough. It’s very expensive to crawl the web,” Monro said. “By being quick, you get your information out correctly, efficiently, and save time and money.”

Surviving In an Agent-First Digital World

All of this — blocking, allowing, controlling — assumes humans are still the primary audience. But Monro isn’t so sure this will remain the case.

“I think websites are going to be obsolete from a human perspective,” she said.

It’s a bold statement, especially coming from a head of content. But she’s not suggesting original sites will become obsolete. In fact, Monro has one piece of advice for anyone who’s listening.

“Just remember to be human. I think that everybody needs to remember that people want to hear the friction, the counter arguments,” she said. “That’s what makes content human and engaging.”

Because even if discovery becomes agent-driven, someone still has to produce something worth crawling.

About the Author

Contributing Expert

Jordan Sprogis is a creative writer and tech researcher who has been working on online content for the better part of a decade. She holds a bachelor's degree in professional writing from Western Connecticut State University and has devoted much of her career to crafting content for various web verticals, including CyberSpyder and The Echo. Since joining HostingAdvice, Jordan has combined her storytelling ability with her fascination for advancements in technology to pen over 500 articles geared toward industry pros and newcomers alike.

« BACK TO: BLOG

Meet the Experts

Our team of experts with a combined 50+ years of experience in web hosting serve insight and advice to more than 20 million users!

We Know Hosting

$

4

8

,

2

8

3

spent annually on web hosting!