Ensure Ad-Related Bots Can Access Your Website
.webp)
Good bot, bad bot: A strategic approach to bot management
You've heard of "good cop, bad cop"-but when it comes to website bots, this isn't a game. It's a strategic decision that directly impacts your revenue. The challenge is simple: welcome the bots that verify your ad supply chain and generate income (the "good bots"), while blocking the AI scrapers that consume bandwidth without providing value (the "bad bots").
Many publishers have taken the nuclear option-blocking all bots-only to discover they've accidentally blocked the very bots that enable their advertising revenue. This newsletter will help you implement a smart, strategic approach to bot management.
A note before you dive in: Yes, this is a longer read than our usual updates. But we're confident this is one of the most important newsletters you'll receive this year. The decisions you make about bot access directly impact your ad revenue, SEO visibility, and social media reach. With Amazon's March 2026 deadline approaching and the rapid evolution of AI crawlers, getting this right now will save you from revenue loss and headaches down the road. We've included specific code examples, verification steps, and strategic guidance to make implementation as straightforward as possible.
Why this matters
Many publishers have recently blocked all bots using Cloudflare's bot management to combat AI crawlers. While understandable, this approach can accidentally block critical bots that verify your advertising supply chain and analyze your content for ad quality.
Starting March 2, 2026, Amazon Ads will stop serving ads to websites that block their verification bot. This means potential revenue loss if not addressed promptly.
Beyond Amazon, other essential bots include:
- Google bots - Verify ad inventory and prevent fraud
- Supply Side Platform (SSP) bots - Validate ads.txt and sellers.json files
- Ad quality bots - Ensure brand safety and contextual targeting
- Brand safety crawlers - IAS (Integral Ad Science), DoubleVerify, and other verification services analyze your content to rate your inventory for advertisers. If these bots can't access your site, verification partners cannot accurately assess your content, which may make your inventory inaccessible to their advertiser clients or result in lower-quality demand
When these bots can't access your site, you risk:
- Lost ad revenue from major demand sources
- Reduced fill rates and lower CPMs
- Being removed from premium ad programs
The AI crawler dilemma: Block or allow?
AI crawlers represent one of the most controversial topics in digital publishing today. The decision to block or allow them isn't black and white - it requires understanding what you're actually giving up or gaining.
The case for blocking AI crawlers
Many of the AI bots in our block list are primarily used for training large language models. When you allow these bots, you're essentially providing free training data that:
- Powers AI systems that compete with you for audience attention - Users get answers directly from ChatGPT, Claude, or Perplexity instead of visiting your site
- Provides no direct traffic or attribution - Unlike search engines that send visitors to your site, many AI training crawlers simply consume your content without giving anything back
- Uses your server resources - These bots can be aggressive crawlers that consume bandwidth without generating revenue
- Trains models on your proprietary content - Your unique analysis, research, and expertise becomes part of a model's general knowledge
The bots we include in the block list - GPTBot, ClaudeBot, CCBot, Google-Extended, and others - are primarily focused on gathering training data.
The case for allowing (some) AI crawlers
However, the AI landscape is evolving, and blocking everything AI-related may not be the best long-term strategy:
- AI search and discovery - Some AI bots (like ChatGPT-User and Perplexity) can drive traffic by citing your content as sources and linking back to your site. They're becoming a new discovery channel, similar to how Google Search works
- Product visibility - If you run an e-commerce site, AI shopping assistants and agents may help users discover your products. Blocking these bots could mean missing out on AI-driven commerce
- Attribution and citations - Some AI systems are building attribution mechanisms. Early cooperation might position you favorably as these systems mature
- Future revenue models - The AI industry is exploring compensation models for content creators. Early blockers might be excluded from these programs
Our recommendation: Strategic blocking
We recommend a nuanced approach:
- Block pure training bots - Bots like GPTBot, Google-Extended, CCBot, and anthropic-ai that are primarily used for model training provide little immediate value. Our examples block these by default.
- Consider allowing search-oriented AI bots - If you want visibility in AI-powered search and answers, you might choose to allow ChatGPT-User (which powers ChatGPT search) or PerplexityBot (which cites and links to sources). You can easily modify our robots.txt examples to allow these.
- Monitor and adjust - Check your server logs regularly to see which AI bots are hitting your site and how aggressively. Some publishers allow AI crawlers but use Cloudflare rate limiting to prevent abuse.
- Distinguish between AI training and AI agents - Meta's FacebookBot and meta-externalagent are both AI-related, but facebookexternalhit (for link previews) should always be allowed. The distinction matters.
Important note: The AI crawler landscape is evolving rapidly. What's primarily a training bot today might become a traffic driver tomorrow. We recommend revisiting your AI crawler strategy quarterly and staying informed about how these systems are developing attribution and compensation mechanisms.
What you need to do
Important disclaimer: The robots.txt examples below are templates to help you get started. Do not blindly copy and paste them. Your robots.txt file is a critical control mechanism that should be carefully curated for your specific needs and thoroughly tested after any changes. That's why we've built a verification tool (see below) to help you check that everything works correctly.
Also keep in mind that robots.txt is based on an honor system - well-behaved bots respect it, but malicious bots will ignore it completely. For bad actors that don't respect robots.txt, you'll need to enforce blocking at the firewall level using Cloudflare WAF rules or other bot detection tools. Think of robots.txt as your first line of defense for good bots, and Cloudflare rules as your enforcement layer for bad bots.
Choose the scenario that matches your setup:
Scenario A: You DON'T use Cloudflare
If you're not using Cloudflare or similar bot management services, you only need to configure your robots.txt file.
Step 1: Ensure critical bots can access your site
CRITICAL: Your ads.txt file MUST be accessible to all bots. Never block access to /ads.txt or /app-ads.txt.
Update your robots.txt file to explicitly allow ad-related bots:
# CRITICAL: Ensure ads.txt is accessible to ALL bots
User-agent: *
Allow: /ads.txt
Allow: /app-ads.txt
# Ad Verification & Monetization Bots
# More info: https://radar.cloudflare.com/bots/directory
User-agent: AmazonAdBot # https://radar.cloudflare.com/bots/directory/amazonadbot
Allow: /
User-agent: Mediapartners-Google # https://radar.cloudflare.com/bots/directory/googlemedia
Allow: /
User-agent: AdsBot-Google # https://radar.cloudflare.com/bots/directory/googleads
Allow: /
# Search Engine Bots (Critical for SEO & Discovery)
# More info: https://radar.cloudflare.com/bots/directory
User-agent: Googlebot # https://radar.cloudflare.com/bots/directory/google
Allow: /
User-agent: Googlebot-Image # https://radar.cloudflare.com/bots/directory/googlebotimages
Allow: /
User-agent: Bingbot # https://radar.cloudflare.com/bots/directory/bing
Allow: /
User-agent: Slurp # https://radar.cloudflare.com/bots/directory/yahooslurp
Allow: /
User-agent: DuckDuckBot # https://radar.cloudflare.com/bots/directory/duckduckbot
Allow: /
User-agent: Baiduspider # https://radar.cloudflare.com/bots/directory/baidu
Allow: /
User-agent: YandexBot # https://radar.cloudflare.com/bots/directory/yandex
Allow: /
User-agent: Applebot # https://radar.cloudflare.com/bots/directory/apple
Allow: /
# Social Media Bots (Critical for Audience Growth)
# More info: https://radar.cloudflare.com/bots/directory
User-agent: facebookexternalhit # https://radar.cloudflare.com/bots/directory/facebook
Allow: /
User-agent: Twitterbot # https://radar.cloudflare.com/bots/directory/twitterbot
Allow: /
User-agent: LinkedInBot # https://radar.cloudflare.com/bots/directory/linkedin
Allow: /
User-agent: Pinterestbot # https://radar.cloudflare.com/bots/directory/pinterest
Allow: /
User-agent: Slackbot # https://radar.cloudflare.com/bots/directory/slackbot
Allow: /
User-agent: WhatsApp
Allow: /
User-agent: TelegramBot
Allow: /
User-agent: Discordbot
Allow: /
# Brand safety & ad verification bots
# More info: https://radar.cloudflare.com/traffic/verified-bots
User-agent: ias_crawler # https://radar.cloudflare.com/bots/directory/ias-crawler
Allow: /
User-agent: ias_wombles
Allow: /
User-agent: Leikibot
Allow: /
Important: Remove any existing Disallow or Crawl-delay directives for these bots.
Step 2: Block AI crawlers (optional)
If you want to prevent AI scrapers from accessing your content, add these lines to your robots.txt:
# Block AI Training Crawlers
# More info: https://radar.cloudflare.com/bots/directory
User-agent: GPTBot # https://radar.cloudflare.com/bots/directory/gptbot
Disallow: /
User-agent: ChatGPT-User # https://radar.cloudflare.com/bots/directory/chatgpt-user
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: anthropic-ai
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: PerplexityBot
Disallow: /
User-agent: Omgilibot
Disallow: /
User-agent: Bytespider
Disallow: /
User-agent: FacebookBot
Disallow: /
User-agent: meta-externalagent # https://radar.cloudflare.com/bots/directory/meta-externalagent
Disallow: /
# CRITICAL: Even when blocking bots, always allow ads.txt
User-agent: *
Allow: /ads.txt
Allow: /app-ads.txt
Note: Robots.txt is a suggestion that well-behaved bots follow, but it's not enforced. For strict enforcement, you'll need a WAF or bot management service.
Scenario B: You use Cloudflare and currently BLOCK all bots
If you've implemented broad bot blocking in Cloudflare to combat AI scrapers, you need to create exceptions for ad-related bots.
Step 1: Update your robots.txt file
CRITICAL: Your ads.txt file MUST be accessible. Verify that /ads.txt and /app-ads.txt are not blocked.
Add these lines to your robots.txt:
# CRITICAL: Ensure ads.txt is accessible to ALL bots
User-agent: *
Allow: /ads.txt
Allow: /app-ads.txt
# Ad Verification & Monetization Bots
# More info: https://radar.cloudflare.com/bots/directory
# https://radar.cloudflare.com/traffic/verified-bots
User-agent: AmazonAdBot # https://radar.cloudflare.com/bots/directory/amazonadbot
Allow: /
User-agent: Mediapartners-Google # https://radar.cloudflare.com/bots/directory/googlemedia
Allow: /
User-agent: AdsBot-Google # https://radar.cloudflare.com/bots/directory/googleads
Allow: /
# Search Engine Bots (Critical for SEO & Discovery)
# More info: https://radar.cloudflare.com/bots/directory
# https://radar.cloudflare.com/traffic/verified-bots
User-agent: Googlebot # https://radar.cloudflare.com/bots/directory/google
Allow: /
User-agent: Googlebot-Image # https://radar.cloudflare.com/bots/directory/googlebotimages
Allow: /
User-agent: Bingbot # https://radar.cloudflare.com/bots/directory/bing
Allow: /
User-agent: Slurp # https://radar.cloudflare.com/bots/directory/yahooslurp
Allow: /
User-agent: DuckDuckBot # https://radar.cloudflare.com/bots/directory/duckduckbot
Allow: /
User-agent: Baiduspider # https://radar.cloudflare.com/bots/directory/baidu
Allow: /
User-agent: YandexBot # https://radar.cloudflare.com/bots/directory/yandex
Allow: /
User-agent: Applebot # https://radar.cloudflare.com/bots/directory/apple
Allow: /
# Social Media Bots (Critical for Audience Growth)
# More info: https://radar.cloudflare.com/bots/directory
# https://radar.cloudflare.com/traffic/verified-bots
User-agent: facebookexternalhit # https://radar.cloudflare.com/bots/directory/facebook
Allow: /
User-agent: Twitterbot # https://radar.cloudflare.com/bots/directory/twitterbot
Allow: /
User-agent: LinkedInBot # https://radar.cloudflare.com/bots/directory/linkedin
Allow: /
User-agent: Pinterestbot # https://radar.cloudflare.com/bots/directory/pinterest
Allow: /
User-agent: Slackbot # https://radar.cloudflare.com/bots/directory/slackbot
Allow: /
User-agent: WhatsApp
Allow: /
User-agent: TelegramBot
Allow: /
User-agent: Discordbot
Allow: /
# Brand safety & ad verification bots
# More info: https://radar.cloudflare.com/traffic/verified-bots
User-agent: ias_crawler # https://radar.cloudflare.com/bots/directory/ias-crawler
Allow: /
User-agent: ias_wombles
Allow: /
User-agent: Leikibot
Allow: /
Important: Remove any existing Disallow or Crawl-delay directives for these bots.
Step 2: Create Cloudflare exceptions for critical bots
Robots.txt alone won't help if Cloudflare is blocking bots at the firewall level. You must allowlist them:
Important: Many of these bots use static IP address ranges. If you have IP-based blocking rules (country blocks, ASN blocks, or IP range blocks), you may need to create exceptions for bot IP ranges in addition to user-agent allowlisting. Google, Microsoft/Bing, and major platforms publish their official IP ranges - check the Cloudflare bot directory links for details.
- Log into your Cloudflare dashboard
- Go to Security > WAF > Custom Rules
- Create a new rule to allow AmazonAdBot:
- Rule name: Allow AmazonAdBot
- Field: User Agent
- Operator: contains
- Value: AmazonAdBot
- Action: Skip > All remaining custom rules
- Save and deploy
- Repeat for other critical bots:
- Ad bots: AdsBot-Google, Mediapartners-Google
- Search engines: Googlebot, Bingbot, Baiduspider, YandexBot, DuckDuckBot, Applebot, Slurp
- Social media: facebookexternalhit, Twitterbot, LinkedInBot, Pinterestbot, Slackbot, WhatsApp, TelegramBot, Discordbot
- Brand safety & ad verification: ias_crawler, ias_wombles, Leikibot
- SSP-specific bots: Any additional bots from your demand partners
Pro tip: You can create one combined rule with multiple conditions using "or" logic to streamline setup. For example: (User Agent contains "AmazonAdBot") or (User Agent contains "Googlebot") or (User Agent contains "facebookexternalhit")
Note on IP-based blocks: If you block by country, ASN, or IP range, user-agent rules alone won't work. You'll need to add IP allowlist rules. For example, Facebook crawls from ASN 32934 and 63293, and Bingbot from ASN 8075. Use the Cloudflare bot directory to find official IP ranges for each bot.
Step 3: Ensure ads.txt is accessible
Create a specific rule to ensure ads.txt files are never blocked:
- In Cloudflare, go to Security > WAF > Custom Rules
- Create a new rule:
- Rule name: Allow ads.txt access
- Field: URI Path
- Operator: contains
- Value: ads.txt
- Action: Skip > All remaining custom rules
- Save and deploy
Better Alternative: Consider switching to Scenario C's approach (allow by default, block selectively) for easier maintenance.
Scenario C: You use Cloudflare and currently ALLOW all bots
If you're not currently blocking bots but want to prevent AI scrapers while maintaining ad revenue, use this selective blocking approach.
Step 1: Update your robots.txt file
CRITICAL: Ensure your ads.txt file remains accessible. Never add rules that would block /ads.txt or /app-ads.txt.
Verify your robots.txt allows critical bots and blocks AI scrapers:
# CRITICAL: Ensure ads.txt is accessible to ALL bots
User-agent: *
Allow: /ads.txt
Allow: /app-ads.txt
# Allow Ad-Related Bots
# https://radar.cloudflare.com/traffic/verified-bots
User-agent: AmazonAdBot # https://radar.cloudflare.com/bots/directory/amazonadbot
Allow: /
User-agent: Mediapartners-Google # https://radar.cloudflare.com/bots/directory/googlemedia
Allow: /
User-agent: AdsBot-Google # https://radar.cloudflare.com/bots/directory/googleads
Allow: /
# Allow Search Engine Bots
# More info: https://radar.cloudflare.com/bots/directory
# https://radar.cloudflare.com/traffic/verified-bots
User-agent: Googlebot # https://radar.cloudflare.com/bots/directory/google
Allow: /
User-agent: Bingbot # https://radar.cloudflare.com/bots/directory/bing
Allow: /
User-agent: Slurp # https://radar.cloudflare.com/bots/directory/yahooslurp
Allow: /
User-agent: DuckDuckBot # https://radar.cloudflare.com/bots/directory/duckduckbot
Allow: /
User-agent: Baiduspider # https://radar.cloudflare.com/bots/directory/baidu
Allow: /
User-agent: YandexBot # https://radar.cloudflare.com/bots/directory/yandex
Allow: /
User-agent: Applebot # https://radar.cloudflare.com/bots/directory/apple
Allow: /
# Allow Social Media Bots (Critical for Audience Growth)
# More info: https://radar.cloudflare.com/bots/directory
# https://radar.cloudflare.com/traffic/verified-bots
User-agent: facebookexternalhit # https://radar.cloudflare.com/bots/directory/facebook
Allow: /
User-agent: Twitterbot # https://radar.cloudflare.com/bots/directory/twitterbot
Allow: /
User-agent: LinkedInBot # https://radar.cloudflare.com/bots/directory/linkedin
Allow: /
User-agent: Pinterestbot # https://radar.cloudflare.com/bots/directory/pinterest
Allow: /
User-agent: Slackbot # https://radar.cloudflare.com/bots/directory/slackbot
Allow: /
User-agent: WhatsApp
Allow: /
User-agent: TelegramBot
Allow: /
User-agent: Discordbot
Allow: /
# Brand safety & ad verification bots
# More info: https://radar.cloudflare.com/traffic/verified-bots
User-agent: ias_crawler # https://radar.cloudflare.com/bots/directory/ias-crawler
Allow: /
User-agent: ias_wombles
Allow: /
User-agent: Leikibot
Allow: /
# Block AI Training Crawlers
# More info: https://radar.cloudflare.com/bots/directory
# https://radar.cloudflare.com/traffic/verified-bots
User-agent: GPTBot # https://radar.cloudflare.com/bots/directory/gptbot
Disallow: /
User-agent: ChatGPT-User # https://radar.cloudflare.com/bots/directory/chatgpt-user
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: anthropic-ai
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: PerplexityBot
Disallow: /
User-agent: Omgilibot
Disallow: /
User-agent: Bytespider
Disallow: /
User-agent: FacebookBot
Disallow: /
User-agent: meta-externalagent # https://radar.cloudflare.com/bots/directory/meta-externalagent
Disallow: /
Important: Note the distinction between facebookexternalhit (allow - generates link previews) and FacebookBot (block - AI training). Blocking FacebookBot while allowing facebookexternalhit ensures your links look good on social media while protecting your content from AI training.
Step 2: Enforce AI bot blocking in Cloudflare
Robots.txt is just a suggestion. To truly block AI scrapers, enforce it in Cloudflare:
- Go to Security > WAF > Custom Rules
- Create a rule to block AI bots:
- Rule name: Block AI Scrapers
- Field: User Agent
- Operator: contains
- Value: Enter one bot name (e.g., "GPTBot")
- Action: Block
- Repeat for each AI bot listed above
Pro tip: You can combine multiple user agents in a single rule using "or" conditions to streamline your setup. For example, create one rule with the condition: (User Agent contains "GPTBot") or (User Agent contains "ClaudeBot") or (User Agent contains "CCBot")
Step 3: Verify ads.txt accessibility
Double-check that your Cloudflare rules don't accidentally block ads.txt:
- Visit http://yoursite.com/ads.txt  in a browser
- Check Cloudflare Firewall Events to ensure no blocks on ads.txt requests
- If needed, create an explicit allow rule for ads.txt (see Scenario B, Step 3)
Verification instructions
After making these changes:
- Verify your robots.txt is accessible: Visit yoursite.com/robots.txt in a browser to confirm the file loads and contains your rules
- Use webmaster verification tools: Cloudflare's firewall may block bot IP ranges, making verification tools essential:
- Google Search Console: Use the robots.txt Tester to verify Googlebot access
- Bing Webmaster Tools: Use the Verify Bingbot tool and Site Scan feature
- Facebook Sharing Debugger: Test that facebookexternalhit can access your pages for link previews
- Monitor your Cloudflare Firewall Events to ensure bots are getting through and not being blocked by WAF rules
- Use Publisher Collective's Bot Access Checker: We've built a quick verification tool that tests accessibility for all the critical bots mentioned in this guide. Simply enter your domain at https://botcheck.publisher-collective.com/ Â to get an instant report on which bots can access your site and ads.txt file.
Additional resources
Cloudflare Bot Directory: If you use Cloudflare, visit their Bot Management directory at https://radar.cloudflare.com/traffic/verified-bots  to:
- Get detailed information about verified bots and their behavior
- Verify which bots are legitimate by checking their IP ranges
- Manage bot access policies for your domains
- Monitor bot traffic patterns and trends
This directory is particularly useful when you encounter an unfamiliar bot in your logs and need to determine whether it's legitimate before adding it to your allowlist.
Questions?
If you need assistance with these changes or have questions about which bots to allow, please reach out to your Publisher Collective account manager.
Action Required By: March 2, 2026 (for Amazon Ads compliance)
Book a call with an expert
We pride ourselves on creating meaningful relationships with our publishers, understanding their priorities and customizing our solutions to meet their unique needs.





