The one-liner:

dd if=/dev/zero bs=1G count=10 | gzip -c > 10GB.gz

This is brilliant.

  • tal@lemmy.today
    link
    fedilink
    English
    arrow-up
    27
    ·
    12 hours ago

    Anyone who writes a spider that’s going to inspect all the content out there is already going to have to have dealt with this, along with about a bazillion other kinds of oddball or bad data.

    • lennivelkant@discuss.tchncs.de
      link
      fedilink
      English
      arrow-up
      6
      ·
      7 hours ago

      That’s the usual case with arms races: Unless you are yourself a major power, odds are you’ll never be able to fully stand up to one (at least not on your own, but let’s not stretch the metaphor too far). Often, the best you can do is to deterr other, minor powers and hope major ones never have a serious intent to bring you down.

      In this specific case, the number of potential minor “attackers” and the hurdle for “attack” mKe it attractive to try to overwhelm the amateurs at least. You’ll never get the pros, you just hope they don’t bother you too much.

    • catloaf@lemm.ee
      link
      fedilink
      English
      arrow-up
      22
      arrow-down
      2
      ·
      10 hours ago

      Competent ones, yes. Most developers aren’t competent, scraper writers even less so.

      • idriss@lemm.ee
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 hour ago

        That’s true. Scrapping is a gold mine for the people that don’t know. I worked for a place which crawls the internet and beyond (fetches some internal dumps we pay for). There is no chance a zip bomb would crash the workers as there are strict timeouts and smell tests (even if a does it will crash an ECS task at worst and we will be alerted to fix that within a short time). We were as honest as it gets though, following GDPR, honoring the robots file, no spiders or scanners allowed, only home page to extract some insights.

        I am aware of some big name EU non-software companies very interested in keeping an eye on some key things that are only possible with scraping.