LLM crawlers continue to DDoS SourceHut.

Tea@programming.dev · 16 hours ago

LLM crawlers continue to DDoS SourceHut.

thatsnothowyoudoit@lemmy.ca · edit-2 14 hours ago

We use NGINX’s 444 on every LLM crawler we see.

Caddy has a similar “close connection” option called “abort” as part of the static response.

HAProxy has the “silent-drop” option which also closes the TCP connection silently.

I’ve found crawling attempts end more quickly using this option - especially attacks - but my sample size is relatively small.

Edit: we do this because too often we’ve seen them ignore robots.txt. They believe all data is theirs. I do not.

mesamune@lemmy.world · edit-2 10 hours ago

I had the same issue. OpenAI was just slamming my tiny little server, ignoring the robots.txt. I had to install a LLM black hole and put a very basic password protection around my git server frontend, since it kept getting slammed by the crawler.

As much as I dont like google, I did see them come in, look at the robot.txt and no other calls for a week. Thats how it should work.

Treczoks@lemmy.world · 15 hours ago

I wonder how much of the load problems I observe with lemmy.world are due to AI crawlers.

Roguelazer@lemmy.world · 15 hours ago

The companies that run these residential proxy networks are sketchy as shit and in a better world would be criminally prosecuted. They’re tricking random low-information users into installing VPNs and other software with backdoors that turn them into a veritable botnet.

LLM crawlers continue to DDoS SourceHut.

LLM crawlers continue to DDoS SourceHut.

LLM crawlers continue to DDoS SourceHut | sr.ht status