Granted, I really don’t know much about how all this works, but the thought occurred to me that Lemmy - as wonderfully open as it is, and without any kind of ‘disappearing messages’ or other privacy protecting functionality - is basically a smorgasbord for AI scrapers. Or am I (hopefully) wrong about this?
Yes, but you are mistaken if you think your data is safe on closed platforms.
If you post it on the internet, you have to assume it’s gonna be there forever.
*laughs in private tracker community
Plenty of trackers have gone down and taken their entire history with them. when baconBits shut down, the admins toyed with the idea of having a backup of the forums for some people who wanted it, but that never happened. Maybe it lives on inside some hard drive squirreled away somewhere, but since the forums were private and only accessible to members, they were never scraped and any history of them officially doesn’t exist.
In the limit, all data is either destroyed or made public—privacy is always temporary.
Personal opinion, this is much more applicable to paper data than it is to digital data.
Magnetic tape storage has one of the longest lifespans for storage before data corruption and even that seems to at best be about thirty years. Even with ideal conditions for storage this is a very short shelf life.
Without regular backups digital data degrades rather quickly and is difficult to recover after corruption.
Beyond that quickly changing technology standards makes it harder to recover old data. PATA/IDE was the standard 20 years ago, how many people realistically have the tools available to recover an IDE drive when all they have is a slick laptop with a USB-C port? Specialized tools must be used to even recover from recent types of media.
Here’s a more nuanced approach. Once this messages is posted, it’s public. during the same day, it will be copied to a bunch of servers across the fediverse. It’s easily available to everyone who cares to look for it. After a few decades, most copies of the message will be gone, but maybe one or two will still remain tucked away somewhere. It’s still technically public, but it’s getting a bit rare. That’s ok though, because nobody cares about 30 year old online ramblings written on some archaic social media that got replaced by the New Cool Thing.
After a hundred years or so, it’s highly likely that almost every record of this conversation is permanently gone. Maybe there’s a data historian who has a personal copy of the entire fediverse. What if that one historian forgets that their Crystalline Omni-Relational Uni-Protonic Tachyon storage, containing the only copy, was in the pocket of the trousers that went into the washing machine? When they hear the spaceship keys clanging inside the washing machine, they stop the cycle, but by that point, the ‘original manuscript’ is already gone. All you have left are some references, summaries, interpretations, translations etc. Nobody knows what the original actually said, but historians just love to debate and speculate about it anyway.
I believe the point is, once some data is publicly available, even if you try to delete it, you can never be sure all copies are truly gone. Like you said, maybe it lives on somebody’s hard drive, maybe some other user managed to scrape it for their own personal use, maybe they screenshotted the most compromising posts, etc. You can never be sure it’s gone.