Home » Blog

Why North American Hosts Are Bearing 90% of AI Crawler Traffic

Why North American Hosts Are Bearing 90 Of Ai Crawler Traffic

Writer: Jordan Sprogis

Editor: Lillian Castro

Reviewer: Cristian Lopez

Updated: 8/26/2025

Key Takeaways

Fastly’s Q2 2025 Threat Insights report shows AI crawlers made up nearly 80% of all AI bot traffic in Q2.
Nine out of 10 crawler requests landed in North America — and that’s driving massive infrastructure and analytical costs for hosts.

Fastly’s Q2 2025 Threat Insights Report shows that nearly 80% of AI bot traffic now comes from crawlers, and 90% of that activity is targeting the U.S. and Canada.

That means North American providers have become the succulent meal at an AI crawler’s all-you-can-eat-buffet, with stateside clients facing more wasted bandwidth, server strain, and unclear analytics than anywhere else in the world.

Why North America?

Fastly suggests that a major draw is the fact that North America is home to most English-language websites.

“A significant observation is the apparent heavy reliance of most AI models on content sourced from North America. This concentration suggests a potential bias towards North American perspectives in their learned understanding,” the report reads.

Broken down, here’s what the data looks like:

North America: Almost all bot traffic is crawlers (about 90%)
Latin America: Still crawler-heavy at 72%
APAC (Asia Pacific): More balanced, but crawlers still dominate at 58%
EMEA (Europe, Middle East, Africa): Fetchers, which are real-time bots like ChatGPT queries, make up 59% of AI bot traffic

Graph by Fastly

For some, this will come as no surprise: Training sets are overwhelmingly English-based.

For example, when Common Crawl sweeps the web, almost half of everything it grabs (about 45%) is English-language content. No other language comes close: German, Russian, Japanese, French, Spanish, Chinese, etc. all sit below 6% each.

Rank	Language	Approx. % of Documents
1	English	44–46%
2	German	5.4–5.8%
3	Russian	5–6%
4	Japanese	5.1%
5	French	4.5%
6	Spanish	4.3%
7	Polish	1.7–1.8%
8	Chinese	1.1–1.5%

Source: Common Crawl

Of course, that’s no accident. Most major LLMs come out of English-speaking institutions. Take a look at OpenAI, Meta, Anthropic, and Google, all of which are U.S.-based and building first for U.S. markets.

Meta’s own LLaMA 2 paper acknowledged that more than 80% of its training data is English and even warns the model may not perform well in other languages.

For hosts, that bias has very real consequences.

Because English content dominates training sets, U.S. and Canadian websites — and the infrastructure behind them — become the first stop for AI crawlers.

But it’s not just any sites being scraped: Fastly confirmed that eCommerce, technology, and media/entertainment are the most sought-after verticals.

Graph by Fastly

As for why, the report noted that “This likely reflects the high value of these domains in terms of fresh, dynamic, and information rich content such as product listings, news articles, reviews, and technical documentation, which are useful for training or grounding language models.”

What It Means for Hosts

Fastly reported that some crawlers spike at 1,000 requests per minute, while fetchers can hit 39,000 requests per minute. It’s enough to cause DDoS-like effects, such as slowdowns and timeouts.

On top of that, a lack of bot verification is still an issue, making it hard for security teams to distinguish between legit automation (think search engines, uptime monitors) and human impersonation.

Arun Kumar, Senior Security Researcher at Fastly

“Whether scraping for training data or delivering real-time responses, these bots create new challenges for visibility, control, and cost,” said Arun Kumar, Senior Security Researcher at Fastly. “You can’t secure what you can’t see, and without clear verification standards, AI-driven automation risks are becoming a blind spot for digital teams.”

HostingAdvice has already reported on this freeloading problem — for lack of a better term — when Cloudflare called out AI scrapers for harvesting content without consent or compensation.

If hosts don’t control crawler traffic, they basically end up subsidizing AI companies while customers pay higher bills with worse performance.

That’s a recipe for churn disaster, but there is an upside.

Hosts can’t stop crawlers, but they can control how much they take, when they take it, and what it costs. Providers who treat AI bot traffic as an infrastructure challenge — not a faraway concept — will protect their clients from unforeseen charges from capacity hikes.

Whether that means rolling out llm.txt or partnering with vendors who offer bot mitigation tools, the point is to never pay for someone else’s training data.

About the Author

Jordan Sprogis is a creative writer and tech researcher who has been working on online content for the better part of a decade. She holds a bachelor's degree in professional writing from Western Connecticut State University and has devoted much of her career to crafting content for various web verticals, including CyberSpyder and The Echo. Since joining HostingAdvice, Jordan has combined her storytelling ability with her fascination for advancements in technology to pen over 500 articles geared toward industry pros and newcomers alike.

View Jordan Sprogis's Full Profile »