Open source devs are fighting AI crawlers with cleverness and vengeance

AI Web-Crawling Bots: The Cockroaches of the Internet

The Problem

While any website might be targeted by bad crawler behavior, open source developers are disproportionately impacted, writes Niccolò Venerandi, developer of a Linux desktop known as Plasma and owner of the blog LibreNews. This is because open source projects share more of their infrastructure publicly and have fewer resources than commercial products.

The Issue

The problem lies in the fact that many AI bots do not honor the Robots Exclusion Protocol (robot.txt) file, which tells bots what not to crawl. This protocol was originally created for search engine bots. As a result, many AI bots ignore the robot.txt file and continue to crawl and scrape websites, often causing outages and slow performance.

Fighting Back

In response, some developers have started fighting back in ingenuous and often humorous ways. One example is Xe Iaso, a FOSS developer who built a tool called Anubis to block AI crawler bots. Anubis is a reverse proxy proof-of-work check that must be passed before requests are allowed to hit a Git server. It blocks bots but lets through browsers operated by humans.

Enter the God of Graves

The funny part: Anubis is the name of a god in Egyptian mythology who leads the dead to judgment. The project has spread like the wind among the FOSS community, with Iaso sharing it on GitHub on March 19, and in just a few days, it collected 2,000 stars, 20 contributors, and 39 forks.

Vengeance as Defense

The instant popularity of Anubis shows that Iaso’s pain is not unique. In fact, Venerandi shared story after story of other developers experiencing similar issues. One founder CEO of SourceHut, Drew DeVault, described spending "from 20-100% of my time in any given week mitigating hyper-aggressive LLM crawlers at scale."

Conclusion

In conclusion, AI web-crawling bots are indeed the cockroaches of the internet, and developers are fighting back with cleverness and a touch of humor. While a direct fix may not be possible, these creative solutions provide a sense of justice and a way to defend against these bots.

FAQs

Q: What is the Robots Exclusion Protocol (robot.txt) file?
A: The robot.txt file is a standard protocol that tells bots what not to crawl.

Q: What is Anubis?
A: Anubis is a reverse proxy proof-of-work check that must be passed before requests are allowed to hit a Git server. It blocks bots but lets through browsers operated by humans.

Q: What is the goal of the Nepenthes tool?
A: The goal of the Nepenthes tool is to trap crawlers in an endless maze of fake content, aiming to do exactly that.

Q: What is the purpose of the AI Labyrinth tool?
A: The AI Labyrinth tool is intended to slow down, confuse, and waste the resources of AI Crawlers and other bots that don’t respect ‘no crawl’ directives.

Post Views: 43

Open source devs are fighting AI crawlers with cleverness and vengeance

Generate single title from this title Train CodeFu-7B with veRL and Ray on Amazon SageMaker Training jobs in 100 -150 characters. And it must...

Generate single title from this title Nearly half of high school students now use AI in college search in 100 -150 characters. And it...

Engineering confidence to navigate uncertainty | MIT News

Generate single title from this title Best of MWC 2026: Live updates on phones, concepts, and robots we’re seeing in 100 -150 characters. And...

Featured video: Coding for underwater robotics | MIT News

Generate single title from this title Train CodeFu-7B with veRL and Ray on Amazon SageMaker Training jobs in 100 -150 characters. And it must...

Generate single title from this title Nearly half of high school students now use AI in college search in 100 -150 characters. And it...

Engineering confidence to navigate uncertainty | MIT News

Generate single title from this title Best of MWC 2026: Live updates on phones, concepts, and robots we’re seeing in 100 -150 characters. And...

Featured video: Coding for underwater robotics | MIT News

Generate single title from this title Upgrading agentic AI for finance workflows in 100 -150 characters. And it must return only title i dont...

Generate single title from this title Making Softmax More Efficient with NVIDIA Blackwell Ultra in 100 -150 characters. And it must return only title...

Generate single title from this title Nvidia shares fall as blockbuster results fail to dazzle in 100 -150 characters. And it must return only...

LEAVE A REPLY Cancel reply

Latest

Generate single title from this title Train CodeFu-7B with veRL and Ray on Amazon SageMaker Training jobs in 100 -150 characters. And it must...

Generate single title from this title Nearly half of high school students now use AI in college search in 100 -150 characters. And it...

Engineering confidence to navigate uncertainty | MIT News

Categories

Useful Links

Our Newsletter