• 0 Posts
  • 8 Comments
Joined 10 months ago
cake
Cake day: May 14th, 2024

help-circle


  • In case anyone is unfamiliar, Aaron Swartz downloaded a bunch of academic journals from JSTOR. This wasn’t for training AI, though. Swartz was an advocate for open access to scientific knowledge. Many papers are “open access” and yet are not readily available to the public.

    Much of what he downloaded was open-access, and he had legitimate access to the system via his university affiliation. The entire case was a sham. They charged him with wire fraud, unauthorized access to a computer system, breaking and entering, and a host of other trumped-up charges, because he…opened an unlocked closet door and used an ethernet jack from there. The fucking Secret Service was involved.

    https://en.wikipedia.org/wiki/Aaron_Swartz#Arrest_and_prosecution

    The federal prosecution involved what was characterized by numerous critics (such as former Nixon White House counsel John Dean) as an “overcharging” 13-count indictment and “overzealous”, “Nixonian” prosecution for alleged computer crimes, brought by then U.S. Attorney for Massachusetts Carmen Ortiz.

    Nothing Swartz did is anywhere close to the abuse by OpenAI, Meta, etc., who openly admit they pirated all their shit.


  • Again: What is the percent “accurate” of an SEO infested blog

    I don’t think that’s a good comparison in context. If Forbes replaced all their bloggers with ChatGPT, that might very well be a net gain. But that’s not the use case we’re talking about. Nobody goes to Forbes as their first step for information anyway (I mean…I sure hope not…).

    The question shouldn’t be “we need this to be 100% accurate and never hallucinate” and instead be “What web pages or resources were used to create this answer” and then doing what we should always be doing: Checking the sources to see if they at least seem trustworthy.

    Correct.

    If we’re talking about an AI search summarizer, then the accuracy lies not in how correct the information is in regard to my query, but in how closely the AI summary matches the cited source material. Kagi does this pretty well. Last I checked, Bing and Google did it very badly. Not sure about Samsung.

    On top of that, the UX is critically important. In a traditional search engine, the source comes before the content. I can implicitly ignore any results from Forbes blogs. Even Kagi shunts the sources into footnotes. That’s not a great UX because it elevates unvetted information above its source. In this context, I think it’s fair to consider the quality of the source material as part of the “accuracy”, the same way I would when reading Wikipedia. If Wikipedia replaced their editors with ChatGPT, it would most certainly NOT be a net gain.


  • 99.999% would be fantastic.

    90% is not good enough to be a primary feature that discourages inspection (like a naive chatbot).

    What we have now is like…I dunno, anywhere from <1% to maybe 80% depending on your use case and definition of accuracy, I guess?

    I haven’t used Samsung’s stuff specifically. Some web search engines do cite their sources, and I find that to be a nice little time-saver. With the prevalence of SEO spam, most results have like one meaningful sentence buried in 10 paragraphs of nonsense. When the AI can effectively extract that tiny morsel of information, it’s great.

    Ideally, I don’t ever want to hear an AI’s opinion, and I don’t ever want information that’s baked into the model from training. I want it to process text with an awareness of complex grammar, syntax, and vocabulary. That’s what LLMs are actually good at.


  • Google as an organization is simply dysfunctional. Everything they make is either some cowboy bullshit with no direction, or else it’s death by committee à la Microsoft.

    Google has always had a problem with incentives internally, where the only way to get promoted or get any recognition was to make something new. So their most talented devs would make some cool new thing, and then it would immediately stagnate and eventually die of neglect as they either got their promotion or moved on to another flashy new thing. If you’ve ever wondered why Google kills so many products (even well-loved ones), this is why. There’s no glory in maintaining someone else’s work.

    But now I think Google has entered a new phase, and they are simply the new Microsoft – too successful for their own good, and bloated as a result, with too many levels of management trying to justify their existence. I keep thinking of this article by a Microsoft engineer around the time Vista came out, about how something like 40 people were involved in redesigning the power options in the start menu, how it took over a year, and how it was an absolute shitshow. It’s an eye-opening read: https://moishelettvin.blogspot.com/2006/11/windows-shutdown-crapfest.html



  • Almost certainly, yes.

    People on Mastodon are not happy about those statements, and called Proton out on it relentlessly with every post Proton made. This is Proton running away with their tail between their legs, back to platforms where they have more control and/or are already full of right-wing nutjobs.

    If anyone’s looking for secure email, look at tuta.com instead. The email service is very similar in terms of UX and offers better encryption. They don’t offer the rest of Proton’s suite, but…maybe that’s a good thing? I mean, do you want to get locked into an ecosystem?