Goodbye, Freeloaders: Cloudflare Changed How AI Scraping Works So Hosts Can Finally Push Back

Writer: Jordan Sprogis

Jordan Sprogis, Contributing Expert

Jordan Sprogis is a creative writer and tech researcher who has been working on online content for the better part of a decade. She holds a bachelor's degree in professional writing from Western Connecticut State University and has devoted much of her career to crafting content for various web verticals, including CyberSpyder and The Echo. Since joining HostingAdvice, Jordan has combined her storytelling ability with her fascination for advancements in technology to pen over 500 articles geared toward industry pros and newcomers alike.

Editor: Lillian Castro

Lillian Castro, Senior Editor

Lillian Castro brings more than 30 years of editing and journalism experience to our team. She has written and edited for major news organizations, including The Atlanta Journal-Constitution and the New York Times, and she previously served as an adjunct instructor at the University of Florida. Today, she edits HostingAdvice content for clarity, accuracy, and reader engagement.

Reviewer: Cristian Lopez

Cristian Lopez, News Manager

Cristian Lopez uses his Business Marketing background from the University of Illinois at Chicago to create comfortable environments for customers, clients, and colleagues to share their thoughts and ideas openly. From interviewing tech leaders to conducting UX market research projects, Cristian knows the importance of storytelling — a key variable for innovation and inspiration. His goal at HostingAdvice is to wow readers on the ever-evolving nature of the tech industry and bring his audience the most reliable and exciting content on all things hosting.

Follow the HostingAdvice team for a daily dose of tech news, trending IT discussions, and interviews with the web's most innovative technologists.
Follow Us:
1k
1k

Cloudflare recently began automatically blocking AI crawlers across its network. It has also introduced a new pay-per-crawl feature (in beta) that lets site owners charge AI bots for access to their content.

Unlike traditional search engine crawlers that index pages and send back referral traffic, AI crawlers scrape content purely to train models, but then deliver no clicks, no ad impressions, and no affiliate conversions in return. And yet, they suck up a lot of resources.

Cloudflare’s new features not only can block unwanted AI bots, they can also begin to monetize access for the ones that want to come through.

Screenshot of AI audit
Take a look at how many total requests versus referral visits this audit shows. Credit: Cloudflare

“If the Internet is going to survive the age of AI, we need to give publishers the control they deserve and build a new economic model that works for everyone – creators, consumers, tomorrow’s AI founders, and the future of the web itself,” said Matthew Prince, co-founder and CEO of Cloudflare.

Many publishers, including The Associated Press, Reddit, Gannett, and The Atlantic, are in full support of Cloudflare’s decision, mainly citing the importance of rebalancing power in the age of AI and the internet.

Neil Vogel, CEO of Dotdash Meredith, emphasized the importance of content control: “We have long said that AI platforms must fairly compensate publishers and creators to use our content. We can now limit access to our content to those AI partners willing to engage in fair arrangements.”

One large sports website received 13 million visits in a month by AI crawlers with only about 600 human visits drawn to the site as a result. When GPTBot alone generates 570 million requests per month, that’s a huge amount of traffic going unnoticed in all the wrong ways.

In fact, referral traffic that comes from AI is 96% lower than using Google Search. Reuters also found that Google’s crawl-to-referral ratio has worsened from 6:1 in 2018 to 18:1 in 2024. All of this is invisible traffic that sites are forced to absorb.

How They Work

Cloudflare will automatically block AI crawlers unless site owners running on its network enable them. If creating a new site, the user will be prompted during setup to choose whether to allow AI bots.

The pay-per-crawl feature is still in beta mode, but Cloudflare said publishers will be able to do one of three things when the feature goes public:

Cloudflare is “dusting off a mostly forgotten piece of the web”: HTTP response code 402 (“payment required” status page). This page is what crawlers will see with pricing if they haven’t yet paid for the content.

Cloudflare acts as the Merchant of Record, meaning it handles the gatekeeping and billing, while site owners or hosts can have full control over whether they want to charge (and how much) for their site’s content.

The Infrastructure Strain Hosts Can’t Ignore

When LLM crawlers like GPTBot and Claude hit sites with hundreds of millions of requests per month, it can cause major issues:

Wikimedia Foundation explained it well: “Our infrastructure is built to sustain sudden traffic spikes from humans during high-interest events, but the amount of traffic generated by scraper bots is unprecedented and presents growing risks and costs.”

Site owners or infra hosts could typically rely on robots.txt, which was designed to prevent crawlers from overloading websites.

Number of fetches across Vercel’s network in the past month from AI crawlers. Credit: Vercel

But it looks like even this safeguard is moot: Although major crawlers, like Google and Microsoft, promise to respect the protocol, a study by TollBit found that several AI companies are ignoring them.

“What this means in practical terms is that AI agents from multiple sources (not just one company) are opting to bypass the robots.txt protocol to retrieve content from sites,” TollBit told Reuters.

robots.txt is not legally enforceable, but several publishers — including The New York Times, which implemented robots.txt and sued OpenAI — have taken legal action against AI companies over unauthorized content use and copyright concerns.

The jury is still out on that one, but a recent U.S. court decision regarding the use of public content hints that they may rule in AI’s favor.

What Comes Next?

If clients stop seeing ROI from ad revenue, affiliate clicks, or search referrals, they’re more likely to downgrade, churn, or abandon their sites altogether. And the smallest plans may be the ones that take the biggest beating.

Shared and VPS environments lack the visibility or isolation needed to detect AI crawlers (especially if the bots disguise their traffic). That means site owners or their hosts may be the ones paying for bot activity they can’t see and don’t even know they have to stop.

Site owners could always upgrade, add more bandwidth, scale to a bigger plan, or invest in bot protection. But if a host encourages that without offering any protection from scraping, it can look a bit sketchy.

It’s part of why Cloudflare’s pay‑per‑crawl feature may be a valuable tool for hosts, especially those managing multitenant environments: There’s an opportunity to add a monetization layer that also brings real value.

About the Author

Contributing Expert

Jordan Sprogis is a creative writer and tech researcher who has been working on online content for the better part of a decade. She holds a bachelor's degree in professional writing from Western Connecticut State University and has devoted much of her career to crafting content for various web verticals, including CyberSpyder and The Echo. Since joining HostingAdvice, Jordan has combined her storytelling ability with her fascination for advancements in technology to pen over 500 articles geared toward industry pros and newcomers alike.

« BACK TO: BLOG

Meet the Experts

Our team of experts with a combined 50+ years of experience in web hosting serve insight and advice to more than 20 million users!

We Know Hosting

$

4

8

,

2

8

3

spent annually on web hosting!