The issue was not caused, directly or indirectly, by a cyber attack or malicious activity of any kind. Instead, it was triggered by a change to one of our database systems’ permissions which caused the database to output multiple entries into a “feature file” used by our Bot Management system. That feature file, in turn, doubled in size. The larger-than-expected feature file was then propagated to all the machines that make up our network.

The software running on these machines to route traffic across our network reads this feature file to keep our Bot Management system up to date with ever changing threats. The software had a limit on the size of the feature file that was below its doubled size. That caused the software to fail.

  • mech@feddit.org
    link
    fedilink
    English
    arrow-up
    94
    ·
    3 months ago

    A permissions change in one database can bring down half the Internet now.

    • SidewaysHighways@lemmy.world
      link
      fedilink
      English
      arrow-up
      12
      ·
      3 months ago

      certainly brought my audiobookshelf to its knees when i decided that that lxc was gonna go ahead and be the jellyfin server also

  • dan@upvote.au
    link
    fedilink
    English
    arrow-up
    30
    ·
    3 months ago

    When are people going to realise that routing a huge chunk of the internet through one private company is a bad idea? The entire point of the internet is that it’s a decentralized network of networks.

    • Jason2357@lemmy.ca
      link
      fedilink
      English
      arrow-up
      3
      arrow-down
      1
      ·
      3 months ago

      Someone always chimes into these discussions with the experience of being DDOSed and Cloudflare being the only option to prevent it.

      Sounds a lot like a protection racket to me.

  • MonkderVierte@lemmy.zip
    link
    fedilink
    English
    arrow-up
    27
    ·
    3 months ago

    Meaning, internal error, like the other two prior.

    Almost like one big provider with 99.9999% availability is worse than 10 with maybe 99.9%

    • Jason2357@lemmy.ca
      link
      fedilink
      English
      arrow-up
      12
      ·
      3 months ago

      Except, if you chose the wrong 1 of that 10 and your company is the only one down for a day, you get fire-bombed. If “TEH INTERNETS ARE DOWN” and your website is down for a day, no one even calls you.

    • jj4211@lemmy.world
      link
      fedilink
      English
      arrow-up
      9
      ·
      3 months ago

      Note that this outage by itself, based on their chart, was kicking out errors over the span of about 8 hours. This one outage would have almost entirely blown their downtown allowance under 99.9% availability criteria.

      If one big provider actually provided 99.9999%, that would be 30 seconds of all outages over a typical year. Not even long enough for people to generally be sure there was an ‘outage’ as a user. That wouldn’t be bad at all.