Or your hard drive runs full, your server stops working, but you didn’t know because you didn’t have alerting in place. Someone decided it was a good idea to build the data centers for two different regions right next to each other, which was fine until there was a fire. Somebody did something they should not have done. When something goes wrong, it’s not your hard drive failing. But what are we not talking about? That the number one root cause of system failures is human error.
When we talk about failures and outages, we talk about root causes. You can practice resiliency.Īt RoachFest23, CockroachDB’s most recent customer conference, Thomas Boltze (Head of Cloud and Engineering Excellence with Santander) shared the core tenets of resilient systems, and how to practice them in the real world. What you can control, however, is how you handle them when they happen. Calamities like fires, climate change-driven disasters, or major cloud provider outages share one thing in common: you can’t control them. If there is one inevitable truth in technology, it is that disasters happen. CockroachDB: The Definitive Guide shows how to architect apps for effortless scale, bulletproof resilience, and low-latency performance for users anywhere and everywhere.