• FaceDeer@fedia.io
    link
    fedilink
    arrow-up
    3
    arrow-down
    1
    ·
    3 days ago

    Stable Diffusion was trained on the LIAON-5B image dataset, which as the name implies has around 5 billion images in it. The resulting model was around 3 gigabytes. If this is indeed a “compression” algorithm then it’s the most magical and physics-defying ever, as it manages to compress images to less than one byte each.

    Besides, even if we consider the model itself to be fine, they did not buy all the media they trained the model on.

    That is a completely separate issue. You can sue them for copyright violation regarding the actual acts of copyright violation. If an artist steals a bunch of art books to study then sue him for stealing the art books, but you can’t extend that to say that anything he drew based on that learning is also a copyright violation or that the knowledge inside his head is a copyright violation.

    • enumerator4829@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      1
      ·
      edit-2
      3 days ago

      You assume a uniform distribution. I’m guessing that it’s not. The question isn’t ”Does the model contain compressed representations of all works it was trained on”. Enough information on any single image is enough to be a copyright issue.

      Besides, the situation isn’t as obviously flawed with image models, when compared to LLMs. LLMs are just broken in this regard, because it only takes a handful of bytes being retained in order to violate copyright.

      I think there will be a ”find out” stage fairly soon. Currently, the US projects lots and lots of soft power on the rest of the world to enforce copyright terms favourable to Disney and friends. Accepting copyright violations for AI will erode that power internationally over time.

      Personally, I do think we need to rework copyright anyway, so I’m not complaining that much. Change the law, go ahead and make the high seas legal. But set against current copyright laws, most large datasets and most models constitute copyright violations. Just imagine the shitshow if OpenAI was an European company training on material from Disney.

    • HereIAm@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      edit-2
      3 days ago

      There’s a difference between lossy and lossless. You can compress anything down to a single bit if you so wish, just don’t expect to get everything back. That’s how lossy compression works.

      • yetAnotherUser@discuss.tchncs.de
        link
        fedilink
        English
        arrow-up
        2
        ·
        3 days ago

        It’s perfectly legal to compress something to a single bit and publish it.

        Hell, if I take and publish the average color of any copyrighted image that is at least 24 bits. That’s lossy compression yet legal.