• 7 Posts
  • 116 Comments
Joined 2 years ago
cake
Cake day: June 2nd, 2023

help-circle

  • I recently got a solar system and came to the conclusion that if you can sell power back to the grid (not everyone can) for some reasonable percentage of what it costs to buy it, then it will always be worth it to be connected (assuming you already are).

    Quite simply, if you have enough solar capacity to get you through the winter (no house is going to have months of battery storage), then you will always be creating far more than you need in the summer. Selling this excess will easily cover any costs associated to being on the grid.

    Also at current prices batteries are good for backup power only, it’s always cheaper to sell excess power to the grid in the day and buy it back at night than it is to have battery capacity to get through the night. I worked out it would take 40 years for our battery to pay for itself (assuming the battery kept a constant battery capacity for 40 years…) but less than 10 years for the rest of the system to pay for itself.






  • Consider the implications if ChatGPT started saying “I don’t know” to even 30% of queries – a conservative estimate based on the paper’s analysis of factual uncertainty in training data. Users accustomed to receiving confident answers to virtually any question would likely abandon such systems rapidly.

    I think we would just be more careful with how we used the technology. E.g. don’t autocomplete code if the threshold is not met for reasonable certainty.

    I would argue that it’s more useful having a system that says it doesn’t know half the time than a system that’s confidently wrong half the time



  • I think you’d be right that the direct cost of running the crawler and index would not be the issue. But fighting SEO to keep your results decent is probably a cost that dwarfs the basic technical cost of running the crawler and index.

    And you’d need a technical security team on top of things as link farms aren’t your only risk, I’m sure there are countless ways to manipulate the algorithm to put your site on top that Google probably have multiple teams working on fighting it full time.

    Many of these things would likely not be a problem for a startup, though. No one is paying SEO firms big money to get into a search index no one has heard of and hardly anyone uses, so these costs probably grow exponentially over time as you become more well known.


  • I’m not disputing that you might be right, but the internet archive runs a very different service. Mainly that Google needs to continuously prune their 400 billion page index because of link rot. The Internet Archive has the opposite aim, they are preserving sites that no longer exist.

    I’m also not sure they even crawl. Do sites get added on user request? When looking at a medium popularity page, you see it only has a couple of scrapes a year.

    None of them. At least, none that I’m aware of. I just don’t think that direct expenses are the reason that there are are only two major web search tools. I also don’t think Google and bing are good examples to point at when estimating the cost of running a complete search engine.

    I would suggest direct expenses are the barrier, but perhaps crawling is not the main expense. I would be interested to know any speculations you have outside of expenses that cause a barrier?


  • That website claims they add 3-5 billion pages a month. Google is doing that in a day or three, as recency of information is very important in search. Plus that site claims 100 billion pages to Googles 400 billion. It’s still an impressive project.

    Size isn’t everything, so the real question is: what search site uses only the common crawl index and has results on par with bing or google?




  • I think versioning is the better option.

    are you writing about losing the backUp drive?

    No, losing your main version. Imagine you have a computer with syncthing and a server where it syncs to. If you chose no deletions, then it will sync all files to the server but all the stuff you deleted (draft documents, random files, photos from that time your kid held the camera button on your phone down and took 3000 photos in 30 seconds) will be deleted from your computer but still there on your server.

    When you computer gets struck by lightning and everything is destroyed but the server is fine, now you have to re-sort out all your files because all the stuff you deleted is still on the server version.

    Your suggestion of enabling the option to keep previous versions is probably cleaner. Personally I prefer to keep previous versions and deduplicate to save space.




  • Remember sync isn’t a good backup. You’re thinking of loss of drives but if this is important data you need to also consider mistakes.

    If you accidentally delete files you shouldn’t, you don’t want this deletion to sync to all your copies so it’s gone for good and the backup doesn’t help.

    Personally I use borgmatic to keep incremental, deduplicated backups. Then I can go back to previous states.

    If you install nextcloud all in one, it comes with a backup solution (also borg based). Then devices don’t need a copy of every file. But you’ll want your server to have a backup drive for this.

    I then sync my borg backup to a backblaze b2 bucket for offsite, encrypted backup using rclone. That then meets the 3 2 1 backup plan.

    I notice you mention Jellyfin. I don’t back up my Jellyfin media, the cloud storage for that could get very expensive and I could get it again if I needed it.