Goodbye, Freeloaders: Cloudflare Changed How AI Scraping Works So Hosts Can Finally Push Back

Writer: Jordan Sprogis

Editor: Lillian Castro

Reviewer: Cristian Lopez

Posted: 7/9/2025

Follow the HostingAdvice team for a daily dose of tech news, trending IT discussions, and interviews with the web's most innovative technologists.

Key Takeaways

Cloudflare now blocks AI crawlers automatically to help reduce infrastructure strain and protect against unmonetized bot traffic.
It also introduced a pay-per-crawl feature that lets site owners charge AI bots for content access.
Whether through blocking or monetizing, these features let hosts recoup bandwidth costs and offer bot protection to clients.

Cloudflare recently began automatically blocking AI crawlers across its network. It has also introduced a new pay-per-crawl feature (in beta) that lets site owners charge AI bots for access to their content.

Unlike traditional search engine crawlers that index pages and send back referral traffic, AI crawlers scrape content purely to train models, but then deliver no clicks, no ad impressions, and no affiliate conversions in return. And yet, they suck up a lot of resources.

Cloudflare’s new features not only can block unwanted AI bots, they can also begin to monetize access for the ones that want to come through.

Screenshot of AI audit — Take a look at how many total requests versus referral visits this audit shows. Credit: Cloudflare

“If the Internet is going to survive the age of AI, we need to give publishers the control they deserve and build a new economic model that works for everyone – creators, consumers, tomorrow’s AI founders, and the future of the web itself,” said Matthew Prince, co-founder and CEO of Cloudflare.

Many publishers, including The Associated Press, Reddit, Gannett, and The Atlantic, are in full support of Cloudflare’s decision, mainly citing the importance of rebalancing power in the age of AI and the internet.

Neil Vogel, CEO of Dotdash Meredith, emphasized the importance of content control: “We have long said that AI platforms must fairly compensate publishers and creators to use our content. We can now limit access to our content to those AI partners willing to engage in fair arrangements.”

One large sports website received 13 million visits in a month by AI crawlers with only about 600 human visits drawn to the site as a result. When GPTBot alone generates 570 million requests per month, that’s a huge amount of traffic going unnoticed in all the wrong ways.

In fact, referral traffic that comes from AI is 96% lower than using Google Search. Reuters also found that Google’s crawl-to-referral ratio has worsened from 6:1 in 2018 to 18:1 in 2024. All of this is invisible traffic that sites are forced to absorb.

How They Work

Cloudflare will automatically block AI crawlers unless site owners running on its network enable them. If creating a new site, the user will be prompted during setup to choose whether to allow AI bots.

The pay-per-crawl feature is still in beta mode, but Cloudflare said publishers will be able to do one of three things when the feature goes public:

Grant the crawler 100% free, unlimited access to content (no blockage)
Require payment at the domain-wide price (this must be predetermined)
Or deny access entirely with no option to bypass with payment

Cloudflare is “dusting off a mostly forgotten piece of the web”: HTTP response code 402 (“payment required” status page). This page is what crawlers will see with pricing if they haven’t yet paid for the content.

Cloudflare acts as the Merchant of Record, meaning it handles the gatekeeping and billing, while site owners or hosts can have full control over whether they want to charge (and how much) for their site’s content.

The Infrastructure Strain Hosts Can’t Ignore

When LLM crawlers like GPTBot and Claude hit sites with hundreds of millions of requests per month, it can cause major issues:

Read the Docs received $5,000 in bandwidth charges in a month after one AI crawler downloaded 73 TB of files
One Reddit user said GPTBot burned through 30 TB of bandwidth in a month, causing a $150 hosting bill increase
Wikipedia reported a 50% bandwidth surge when AI bots began scraping multimedia content

Wikimedia Foundation explained it well: “Our infrastructure is built to sustain sudden traffic spikes from humans during high-interest events, but the amount of traffic generated by scraper bots is unprecedented and presents growing risks and costs.”

Site owners or infra hosts could typically rely on robots.txt, which was designed to prevent crawlers from overloading websites.

Number of fetches across Vercel’s network in the past month from AI crawlers. Credit: Vercel

But it looks like even this safeguard is moot: Although major crawlers, like Google and Microsoft, promise to respect the protocol, a study by TollBit found that several AI companies are ignoring them.

“What this means in practical terms is that AI agents from multiple sources (not just one company) are opting to bypass the robots.txt protocol to retrieve content from sites,” TollBit told Reuters.

robots.txt is not legally enforceable, but several publishers — including The New York Times, which implemented robots.txt and sued OpenAI — have taken legal action against AI companies over unauthorized content use and copyright concerns.

The jury is still out on that one, but a recent U.S. court decision regarding the use of public content hints that they may rule in AI’s favor.

What Comes Next?

If clients stop seeing ROI from ad revenue, affiliate clicks, or search referrals, they’re more likely to downgrade, churn, or abandon their sites altogether. And the smallest plans may be the ones that take the biggest beating.

Shared and VPS environments lack the visibility or isolation needed to detect AI crawlers (especially if the bots disguise their traffic). That means site owners or their hosts may be the ones paying for bot activity they can’t see and don’t even know they have to stop.

Site owners could always upgrade, add more bandwidth, scale to a bigger plan, or invest in bot protection. But if a host encourages that without offering any protection from scraping, it can look a bit sketchy.

It’s part of why Cloudflare’s pay‑per‑crawl feature may be a valuable tool for hosts, especially those managing multitenant environments: There’s an opportunity to add a monetization layer that also brings real value.

Goodbye, Freeloaders: Cloudflare Changed How AI Scraping Works So Hosts Can Finally Push Back

Key Takeaways

How They Work

The Infrastructure Strain Hosts Can’t Ignore

What Comes Next?

About the Author

Meet the Experts

We Know Hosting

News & Interviews

Hosting Reviews

Hosting How-To Guides

Goodbye, Freeloaders: Cloudflare Changed How AI Scraping Works So Hosts Can Finally Push Back

Key Takeaways

How They Work

The Infrastructure Strain Hosts Can’t Ignore

What Comes Next?

About the Author

Other Posts You May Enjoy

Say Goodbye to Censorship with Hola

Achieve Primal Posture with the Gokhale Method

Uppy is an Open-Source JavaScript File Upload Solution

Mailbird’s Email Management Client

AI Workloads Have a New Self-Cooling SSD on the Market — Here’s How It Works

Unlock the Power of Web Scraping With Bright Data

Meet the Experts

We Know Hosting

News & Interviews

Hosting Reviews

Hosting How-To Guides