There are some downstream / knock on effects going on which can be explained…but I can’t help but wonder if today’s story is bigger than just AWS. AWS saying it was an outage of a “few hours” for DynamoDB and DNS…and that doesn’t line up all that great with what people are reporting in the wild . I’m not trying to start a conspiracy theory, just wondering what the post mortems will tell us, if anything. Obviously the suits want to keep embarrassing fuckups downplayed as much as possible.
As someone with a little insider baseball knowledge, it was just a few hours down of DynamoDB and DNS. However, that caused EC2 to go down for ~1 day, which causes pretty much 1/3 of the internet to go down. Once EC2 sorts themselves out, then teams/companies (almost all amazon services use EC2 in the back end) that use EC2 have to get their ducks back in a row, and that can take any span of time, depending on how well their code was written to handle failures + how many people they are willing to pay oncall/overtime.
There are some downstream / knock on effects going on which can be explained…but I can’t help but wonder if today’s story is bigger than just AWS. AWS saying it was an outage of a “few hours” for DynamoDB and DNS…and that doesn’t line up all that great with what people are reporting in the wild . I’m not trying to start a conspiracy theory, just wondering what the post mortems will tell us, if anything. Obviously the suits want to keep embarrassing fuckups downplayed as much as possible.
Its crazy, we are seeing unrelated services stop sending emails, issues with DNS, all sorts of strange stuff.
Same with us. Had to reboot/restart a number of things, and resynch clocks.
Maybe it’s cascade effects? Something depends on something else, which depends on a third thing that depends on AWS for something?
As someone with a little insider baseball knowledge, it was just a few hours down of DynamoDB and DNS. However, that caused EC2 to go down for ~1 day, which causes pretty much 1/3 of the internet to go down. Once EC2 sorts themselves out, then teams/companies (almost all amazon services use EC2 in the back end) that use EC2 have to get their ducks back in a row, and that can take any span of time, depending on how well their code was written to handle failures + how many people they are willing to pay oncall/overtime.
They want to keep the news of the rally over the weekend as quiet as possible.
and the epstein files. i heard it dint affect international that much, so its rather covenient.
Dave explains the “long tail” of recovery:
https://youtu.be/KFvhpt8FN18