The majority of the traffic on the web is from bots. For the most part, these bots are used to discover new content. These are RSS Feed readers, search engines crawling your content, or nowadays AI bo
The article writer kind of complains that they’re having to serve a 10MB file, which is the result of the gzip compression. If that’s a problem, they could switch to bzip2. It’s available pretty much everywhere that gzip is available and it packs the 10GB down to 7506 bytes.
That’s not a typo. bzip2 is way better with highly redundant data.
Gzip encoding has been part of the HTTP protocol for a long time and every server-side HTTP library out there supports it, and phishing/scrapper bots will be done with server-side libraries, not using browser engines.
Further, judging by the guy’s example in his article he’s not using gzip with maximum compression when generating the zip bomb files: he needs to add -9 to the gzip command line to get the best compression (but it will be slower).
(I tested this and it made no difference at all).
The article writer kind of complains that they’re having to serve a 10MB file, which is the result of the gzip compression. If that’s a problem, they could switch to bzip2. It’s available pretty much everywhere that gzip is available and it packs the 10GB down to 7506 bytes.
That’s not a typo. bzip2 is way better with highly redundant data.
Brotli gets it to 8.3K, and is supported in most browsers, so there’s a chance scrapers also support it.
Gzip encoding has been part of the HTTP protocol for a long time and every server-side HTTP library out there supports it, and phishing/scrapper bots will be done with server-side libraries, not using browser engines.
Further, judging by the guy’s example in his article he’s not using gzip with maximum compression when generating the zip bomb files: he needs to add -9 to the gzip command line to get the best compression (but it will be slower).(I tested this and it made no difference at all).I believe he’s returning a gzip HTTP response stream, not just a file payload that the requester then downloads and decompresses.
Bzip isn’t used in HTTP compression.
Brotli is an option, and it’s comparable to Bzip. Brotli works in most browsers, so hopefully these bots would support it.
I just tested it, and a 10G file full of zeroes is only 8.3K compressed. That’s pretty good, though a little bigger than BZip.
TIL why I’m gonna start learning more about bzip2. Thanks!