June 22, 2024

Paull Ank Ford

Business Think different

Do Staging Procedures Need a Rethink?

FavoriteLoadingIncrease to favorites

“Has any individual commenced acquiring conversations with their CIO/CEO about going again to an in-house mail server? I advocate for it”

Offered the scale of its user foundation and with a contract really worth up to $10 billion in the bag to operate the again-conclusion of a superpower’s military services, Microsoft could possibly want to get started wondering about how it can set up a staging process for its Azure cloud that allows it to deploy changes and reliably roll again these changes when issues break.

(We know, it is quick to say so from a secure distance…)

Redmond was at it once again late Monday, knocking an (apparently significant) “subset of customers in the Azure Public and Azure Government clouds” offline for three hrs with swathes of people globally encountering errors performing authentication operations a number of solutions ended up afflicted, such as Microsoft 365.

The company blamed the concern on a “recent configuration adjust [that] impacted a backend storage layer, which caused latency to authentication requests.” (Study, people could not login to Groups, Azure and additional for hrs mainly because of the snafu).

A entire root lead to analysis is pending. (We will update this piece when we see it). The blockage was felt for people from 22:25 BST on Sep 28 2020 to 01:23 BST.

The concern comes a fortnight following a protracted outage in Microsoft’s United kingdom South area activated by a cooling technique failure in a information centre. With temperatures soaring, automatic methods shut down all network, compute, and storage assets “to guard information durability” as engineers rushed to consider handbook regulate.

Earlier this month meanwhile Gartner said it “continues to have issues similar to the over-all architecture and implementation of Azure, regardless of resilience-concentrated engineering initiatives and enhanced provider availability metrics during the previous year”.

Microsoft Azure CTO Mark Russinovich in July 2019 said that Azure experienced shaped a new High quality Engineering team in just his CTO office environment, performing along with Microsoft’s Web page Trustworthiness Engineering (SRE) team to “pioneer new ways to provide an even additional dependable platform” adhering to consumer problem at a string of outages.

He wrote at the time: “Outages and other provider incidents are a challenge for all general public cloud suppliers, and we proceed to enhance our being familiar with of the elaborate approaches in which factors this kind of as operational processes, architectural designs, components concerns, software package flaws, and human factors can align to lead to provider incidents.

“Has any individual commenced acquiring conversations with their CIO/CEO about going again to an in-house mail server? I advocate for it” one disappointed user pointed out on a worldwide Outages mailing record meanwhile… If cloud is your compressed audio stream that you’re not guaranteed you very own, it may well not be lengthy prior to in-house mail servers become the vintage quality vinyl of the IT world outdated, but incredibly a lot again in need.

Stranger issues have happened.