AI Bots Are Eating Bandwidth; Hosting Providers Get Stuck With the Tab

Ai Bots Drain Bandwidth Leaving Hosting Providers To Pick Up The Bill

On an otherwise average day, Kyle Wiens checked his web hosting bill and nearly fell out of his chair. His repair guide site somehow racked up $5,000 in bandwidth in a single day.

When the hosting provider was able to identify the cause, it all pointed to a swarm of traffic — not from curious learners, but from bots.

This is not new. As companies like OpenAI and Anthropic race to improve their LLMs, they’re sending more and more aggressive bots to scour the web for fresh, relevant content.

Plenty of evidence has come to light in recent years proving that bots are causing major financial and capacity strain on hosting infrastructure.

For example, OpenAI’s GPTBot and Anthropic’s ClaudeBot have been reported to generate millions of requests per month, sometimes accounting for up to 20% of Google’s search crawler volume.

Bots already make up half of all web traffic, so these stats aren’t entirely surprising. But the rise of AI crawlers may be marking a new chapter in the age-old story of man vs. machine.

Insatiable Hunger for Content

The reason these AI bots are so hungry for content is simple: The more they scrape, the better they become at generating accurate responses.

Sites like iFixit attract a lot of scraping because they feature niche, user-generated content that LLMs crave.

Last year, ClaudeBot completely overwhelmed iFixit’s servers with thousands of hits in less than a day, despite the site’s clear terms prohibiting AI crawlers from scraping its content.

Then, Wiens took to X and posted:

Screenshot of a post by Kyle Wiens on X saying: 'Hey @AnthropicAI: I get you're hungry for data. Claude is really smart! But do you really need to hit our servers a million times in 24 hours? You're not only taking our content without paying, you're tying up our devops resources. Not cool.'
Source: X

He later told The Verge: “Being one of the internet’s top sites makes us pretty familiar with web crawlers and bots. We can handle that load just fine, but this was an anomaly.”

Wiens added: “My first message to Anthropic is: If you’re using this to train your model, that’s illegal. My second is: This is not polite internet behavior. Crawling is an etiquette thing.”

The crawling eventually stopped after Wiens added a crawl-delay extension to the site’s robots.txt.

iFixit isn’t alone, though:

Game UI Database’s founder, Edd Coates, said: “This was essentially a two-week-long DDoS attack in the form of a data heist.”

Eric Holscher of Read the Docs commented: “AI crawlers have cost us a significant amount of money in bandwidth charges, and caused us to spend a large amount of time dealing with abuse.”

Matt Barrie of Freelancer.com shares a similar frustration, noting that he’s forced to block them because they aren’t obeying the rules of the internet.

“This is egregious scraping, which makes the site slower for everyone operating on it and ultimately affects our revenue,” said Barrie.

Bypassing Common Safeguards

To control bot traffic, websites rely on a standard called robots.txt, a file that specifies which parts of a site are off-limits to crawlers.

But the problem is that it’s voluntary, so bots can decide to ignore it, misinterpret the rules, or spoof their user agents.

Additionally, Cloudflare reported that 30% to 40% of AI crawling now comes from bots that don’t identify themselves at all.

Webpages of the New York Times, Common Crawl, OpenAI, and Microsoft are seen on a computer.
In 2023, the New York Times filed a lawsuit against OpenAI and Microsoft, alleging that their AI models used millions of copyrighted articles to train genAI bots like ChatGPT and Copilot. Source: Shutterstock

These “undeclared genAI crawlers” may fake their user-agent strings to appear to be normal browsers or leave them out altogether.

Some AI companies also use third-party data brokers to scrape sites on their behalf, shielding themselves from direct responsibility.

Reid Tatoris, Senior Director of Product at Cloudflare, explained: “We expect this number to grow over time as more websites block declared crawling and as the number of AI crawlers continues to explode.”

Hosts Are Hitting Back

Blocking bots isn’t easy, but the web hosts are trying.

Some common methods include filtering known bot user agents as well as identifying suspicious IP ranges (often from AWS, Azure, and GCP) and using rate-limit tools like mod_evasive or fail2ban.

Cloudflare and Imperva have also launched bot detection and blocking tools. InMotion Hosting has been actively advising customers to make the switch.

Site owners are encouraged to move from shared hosting to dedicated so they can better manage bandwidth. AI bot protection is also becoming a standard feature from many hosting providers.

The battle between AI crawlers and site owners could reach a tipping point of full-on war, but AI companies should really pause and think about whether certain methods are worth it.

It’s as Holscher said: “AI crawlers are acting in a way that is not respectful to the sites they are crawling, and that is going to create a backlash against AI crawlers in general.”