Gopher Menu
----------------------------------------
AI web scrapers / content stealers are out of control
February 19th, 2026
----------------------------------------
On 02/16/2026 at 17:19 I was rudely awakened from a nap by an automated
email from my hosting provider... Stating that my VPS has "exceeded the
notification threshold (90) for CPU Usage by averaging 99.9% for the
last 2 hours."
Upon investigating I discovered that some dumbass AI crawler was sending
hundreds of request per second to my gopher proxy and loading bitreich's
AI trap (LOL!) and also bitreich's PHAROS (Internet Archive over gopher)
instance. Humorous aside from wasting the bandwidth of all involved
parties - 100% cpu, 100% available FDs, 100% of temp space, 100% of RAM,
and a sizable chunk of swap space on gopher.zcrayfish.soy . . .
I immediately went to work to deal with the bots, the gopherproxy uses
temp space MUCH more efficiently and has garbage collection there as
well... And requests sent via the proxy that attempt to load the AI
honeytrap will return a nice random selection of tarbombs, deflate
bombs, markov babble, and other malicious anti-bot content; without
even sending a request off to bitreich. Additionally I have configured
the gopherproxy to send HTTPS redirects to some of the PHAROS URLs back
to the internet archive HTTPS server.
That being said the pressure the bots are putting on this server (and
to a lesser extent other gopher servers via the proxy) is relentless;
I am considering taking drastic actions to continue operating the proxy
without having so much bot abuse...
Ideas I am floating:
Allowing proxy access to only:
authenticated users. (BASIC? DIGEST? IDENT?)
latest versions of chrome/safari/firefox only (other browsers exempt)
UAs that accept cookies (but the bots probably accept them, IDK)
UAs that accept javascript data storage (ditto)
UAs that pass an Anubis challenge. . . . . . .
Referer header enforcement (probably non-starter...)
These fuckers use a different IPv4 address with every single request,
and most of them are even in different netblocks and ASNs. . . .
I'm still thinking about this, would appreciate comments here, on IRC,
or via email with other anti-bot ideas... More malware I can send the
ones I detect... detection ideas... etc.
----------------------------------------
(DIR) Back to phlog index
(DIR) gopher.zcrayfish.soy gopher root
This phlog entry has been read 93 times.
(?) Comments are enabled for this post, select here to leave yours
Comments have been left on this post:
I completely agree. I had to take down my tiny gitweb instance a
couple of months ago (wrote about it here:
gopher://thelambdalab.xyz/0phlog/2025-12-06-Is-nowhere-safe-anymore?.txt
) for very similar reasons. AI scrapers are ruining what was left to
ruin of the web. (They'd be ruining gopher too if someone thought it
was worth it.) [plugd]
Posted Fri Feb 20 09:30:05 UTC 2026 by 129-132-29-28.net4.ethz.ch.
------------------------------------------------------------------------
Forgot to say: my mitigation strategy was basically to take down
gitweb, throttle all web traffic to 1kb _total_, and outright block a
huge number of user-agent strings. I'm not running a proxy though, so
I'm okay with the nuclear option. (I probably went a bit overboard
too, but what can I say - I was mad and my VPS was about to be cut
off..) [plugd]
Posted Fri Feb 20 09:36:47 UTC 2026 by 129-132-29-28.net4.ethz.ch.
------------------------------------------------------------------------
q
Posted Fri Feb 20 22:32:26 UTC 2026 by 2600:4040:2c64:2b00::603
------------------------------------------------------------------------