Cloudflare is helping users block AI bots and crawlers regardless of how well they behave
With an announcement that could not be more timely, Cloudflare recently announced the implementation of an 'easy button' solution to block all AI bots for all its users, including the ones in the free tier. Cloudflare already offers a solution that blocks malicious bots and lets users control whether they want well-behaved bots, AI and otherwise, to be able to visit their website, and if so, how much of it they can access. Cloudflare defines a well-behaving bot as one that has taken the following actions to show they are acting in good faith:
- The bot's maintainers should also maintain a public web page committing to respect robots.txt.
- There should be a verifiable range of IP addresses exclusively used by the bot.
- A stable and unique user-agent should represent the bot.
- The maintainer should respect robots.txt user-agent and wild-card entries.
- AI crawlers should respect crawl delay.
In other words, mostly things that Perplexity AI cannot be bothered to do. Given that Perplexity AI is hardly the only tech company comfortable overstepping widely respected best practices because they find following them inconvenient, Cloudflare's most recent announcement is to extend protection to cover all AI scrapers and crawlers, regardless of their behavior.
To detect bots that may be disguising themselves as legitimate web browsers, Cloudflare deploys its bot detection machine learning model, which analyzes and scores traffic to determine the likelihood of it coming from a bot or a human. Users can implement a rule to challenge traffic with a bot score of 30 or less to block any unwanted disguised bot traffic. To report cases where misbehaving bot activity is allowed by Cloudflare, Enterprise customers should submit a False Negative Feedback Loop. Additionally, all other users can use Cloudflare's new dedicated reporting tool.