
Key Takeaways
- From a $5K spike in bandwidth charges to millions of auto-requests per day, AI crawlers are causing a massive financial strain on hosting infrastructure.
- Bots like GPTBot and ClaudeBot are bypassing robots.txt rules, raising legal and ethical concerns as they scrape data without permission.
- To combat this, web hosting providers are adopting advanced bot detection tools and moving clients to dedicated servers.
On an otherwise average day, Kyle Wiens checked his web hosting bill and nearly fell out of his chair. His repair guide site somehow racked up $5,000 in bandwidth in a single day.
When the hosting provider was able to identify the cause, it all pointed to a swarm of traffic — not from curious learners, but from bots.
This is not new. As companies like OpenAI and Anthropic race to improve their LLMs, they’re sending more and more aggressive bots to scour the web for fresh, relevant content.
Plenty of evidence has come to light in recent years proving that bots are causing major financial and capacity strain on hosting infrastructure.
For example, OpenAI’s GPTBot and Anthropic’s ClaudeBot have been reported to generate millions of requests per month, sometimes accounting for up to 20% of Google’s search crawler volume.
Bots already make up half of all web traffic, so these stats aren’t entirely surprising. But the rise of AI crawlers may be marking a new chapter in the age-old story of man vs. machine.
Insatiable Hunger for Content
The reason these AI bots are so hungry for content is simple: The more they scrape, the better they become at generating accurate responses.
Sites like iFixit attract a lot of scraping because they feature niche, user-generated content that LLMs crave.
Last year, ClaudeBot completely overwhelmed iFixit’s servers with thousands of hits in less than a day, despite the site’s clear terms prohibiting AI crawlers from scraping its content.
Then, Wiens took to X and posted:

He later told The Verge: “Being one of the internet’s top sites makes us pretty familiar with web crawlers and bots. We can handle that load just fine, but this was an anomaly.”
Wiens added: “My first message to Anthropic is: If you’re using this to train your model, that’s illegal. My second is: This is not polite internet behavior. Crawling is an etiquette thing.”
The crawling eventually stopped after Wiens added a crawl-delay extension to the site’s robots.txt
.
iFixit isn’t alone, though:
- The Wikimedia Foundation said its bandwidth usage has jumped 50% since January 2024, largely due to bots that ignore robots.txt.
- Read the Docs reported that one AI crawler downloaded 73 TB of zipped HTML files in 2024 — nearly 10 TB in a single day, causing more than $5,000 in bandwidth charges.
- Game UI Database said OpenAI’s GPTBot was hitting its servers at a rate of 200 requests per second.
- Freelancer.com received 3.5 million hits in just four hours from ClaudeBot, and the crawler kept going even after the team tried to block it.
Game UI Database’s founder, Edd Coates, said: “This was essentially a two-week-long DDoS attack in the form of a data heist.”
Eric Holscher of Read the Docs commented: “AI crawlers have cost us a significant amount of money in bandwidth charges, and caused us to spend a large amount of time dealing with abuse.”
Matt Barrie of Freelancer.com shares a similar frustration, noting that he’s forced to block them because they aren’t obeying the rules of the internet.
“This is egregious scraping, which makes the site slower for everyone operating on it and ultimately affects our revenue,” said Barrie.
Bypassing Common Safeguards
To control bot traffic, websites rely on a standard called robots.txt
, a file that specifies which parts of a site are off-limits to crawlers.
But the problem is that it’s voluntary, so bots can decide to ignore it, misinterpret the rules, or spoof their user agents.
Additionally, Cloudflare reported that 30% to 40% of AI crawling now comes from bots that don’t identify themselves at all.

These “undeclared genAI crawlers” may fake their user-agent strings to appear to be normal browsers or leave them out altogether.
Some AI companies also use third-party data brokers to scrape sites on their behalf, shielding themselves from direct responsibility.
Reid Tatoris, Senior Director of Product at Cloudflare, explained: “We expect this number to grow over time as more websites block declared crawling and as the number of AI crawlers continues to explode.”
Hosts Are Hitting Back
Blocking bots isn’t easy, but the web hosts are trying.
Some common methods include filtering known bot user agents as well as identifying suspicious IP ranges (often from AWS, Azure, and GCP) and using rate-limit tools like mod_evasive
or fail2ban
.
Cloudflare and Imperva have also launched bot detection and blocking tools. InMotion Hosting has been actively advising customers to make the switch.
Site owners are encouraged to move from shared hosting to dedicated so they can better manage bandwidth. AI bot protection is also becoming a standard feature from many hosting providers.
The battle between AI crawlers and site owners could reach a tipping point of full-on war, but AI companies should really pause and think about whether certain methods are worth it.
It’s as Holscher said: “AI crawlers are acting in a way that is not respectful to the sites they are crawling, and that is going to create a backlash against AI crawlers in general.”