Azure Beats the Corporate Datacentre

by | Sep 17, 2018 | Azure, Cloud, Information Architecture

Now released, initial reading of the preliminary RCA of the 4th September (2018) Azure outage further reinforced to me the intrinsic value that the cloud brings to IT operations of any scale.

You read that right, reinforced my view, no diminishing in sight.

Available here, the preliminary RCA highlights a number of key points that, in my view, clearly demonstrate why the cloud is the right choice for so many organisations, particularly if they operate at scale.

The key question for me is; “what would you do if this was your datacentre?”. For me this is where the rubber hits the road with Azure (and, to be fair, other cloud services such as AWS) especially in IaaS workload scenarios.

Self-hosting organisations of any size simply do not have the resources to match the response of Microsoft during this outage. Critically, the word resources is not just about people in this case, it is about resources of all types:

  • Human – Microsoft are resplendent with some of the world’s best engineers, many of whom literally wrote the book on cloud-scale operations and incident response
  • Compute/Storage – the sheer scale of the Microsoft compute and storage resources mean that the is always somewhere else that workloads can run – be it intra-datacentre or extra-datacentre, Microsoft just have iron laying around, available, all the time
  • Resilience – Microsoft has invested millions (possibly billions) of dollars in ensuring that the very fabric of Azure can function regardless of the state of failure. Azure is like the Starship Enterprise, no matter how ‘damaged’ it is a redundant/backup system can be switched on in a relatively short period of time. Sure, this is a benefit of having oodles of compute/storage, but it has to be wired together in the right way to make it beneficial and it needs to be seamlessly switchable to minimise or avoid data loss
  • Orchestration – The investment made in both process and tooling means that the switching of services, the restoration of services, and the initial geo-enablement of services is practically seamless for most users of Azure. Simply put, it just works

For me, the nay-sayers of Azure are just ill-informed and (in some cases) just plain wrong.

The moaning about whatever beef folks had with Azure during the outage is simply misplaced. If catastrophic failure of the type experienced in the South Central Region had happened in a corporate datacentre, they would probably still be working in a ‘failover’ or ‘recovering’ mode as I write, or worse – they would be simply offline.

Running an IT operation is, and always will be, 99% swan swimming and 1% abject chaos. With Azure, at least you stand a fighting chance of staying on top of the chaos when disaster strikes, without it you are simply stopped. Dead.

more to follow…

Photo by Thomas Kvistholt on Unsplash