The headline picture is not that of COVID-19 affected areas. It is of the regions where services provided or hosted on Amazon infrastructure were affected on Wednesday, 25th November, 2020. Scores of websites rely on the internet infrastructure to function. Although, the outage only affected one of AWS’s 24 regions, its US-East-1 Region, but it took down many popular web-based services that utilize its servers, like Roku, Flickr, 1Password, Autodesk, The Washington Post and Adobe Spark. Full recovery went into the wee hours of Thursday morning. Ironically, the outage prevented Amazon from posting updates on its Health Dashboard, leaving thousands of customers in the dark.
In the same vein, here are some news headlines from recent past:
|Microsoft services suffer third outage in last 10 days|
|Microsoft Outlook was down worldwide for four hours|
|Some Microsoft services, including Outlook, Office 365, and Microsoft Teams, experienced a multi-hour outage|
A deeper look at some of these incidents indicate a systemic problem with the big tech companies which provide SaaS based cloud services such as Office 365, Or Adobe Creative Cloud. An outage affects multiple services – so on 28th September when Microsoft outage happened users had no access to Teams, SharePoint, OneDrive and Outlook. The “writing is on the wall” as they say – or as the case may be, on Microsoft’s Twitter feed.
If you were one of the unfortunate ones affected, not only did you lose access to your email, but also the ability to communicate with other team members via Microsoft Teams. To underscore the point, Microsoft Teams, the one-line collaboration tool from Microsoft, is the self-proclaimed life-blood of remote working during the COVID-19 era.
Quality of Service is benchmarked at Five 9s. In simple terms, the tech companies aspire to provide reliability in terms of Five 9s – which translates to uptime of 99.999% or six-sigma or 1 defective part in a million.
A year of 365 days, with 24 hours in each day and 60 minutes in each hour, has 525,600 minutes roughly1.
A 99.999% uptime would mean a downtime of only 5 minutes and 15 seconds in an year. Which doesn’t sound so bad – does it?
This chart below shows the uptime of Office 365. Microsoft seems to be approaching the holy grail of 99.999% in terms of reliability of their office 365 hosted solution.
So how does Microsoft still proclaim Five 9s uptime if they are suffering from hour long outages every other week. And here is the spoiler alert – Unfortunately, the 99.999% uptime is not calculated on a per data-enter basis – it is calculated as an overall value on a monthly basis. So your downtime in a month is compared to the total active users uptime in the same month to come up with the availability figures. Additionally, planned outages are excluded from this figure. Statistics, eh?
And here is how a less than .03% downtime might actually end up killing your business – for good.
According to this article by ZdNet3 – Microsoft has approximately 200 Million users, of which to some approximation 150 Million Office 365 users. Let’s for a moment assume 1.0 Million are in Toronto datacenter. According to Microsoft there are about 41 datacenters – of which 2 are in Canada4.
So, if we were to speculate how would a whole day of outage in a month translate to service availability for your business compared to Microsoft’s statistics.
And here are the calculations5:
|Minutes in a 30-day month||43,200|
|Office 365 active users||150,000,000|
|Minutes available to users in the month||6,480,000,000,000|
|Users in Office 365 Datacenter region – Toronto||1,000,000|
|Projected Outage||1 days = 24 hrs -> 1440 minutes|
|Service outage in user-minutes||1,440,000,000|
|Microsoft Calculated SLA||99.9778 %|
|(Your) Business Calculated SLA||96.67 %|
So, you are essentially looking at no emails for a whole day!!! And while Microsoft proclaims a 99.96% uptime you are left to explain to your clients why you can only provide a 96.67% uptime guarantee.
The bigger you are, the more noticeable you are to the wrong people
What’s common to Booz Allen Hamilton, U.S. Voter Records , Dow Jones & Co , WWE , Verizon Wireless, Time Warner Cable , Pentagon Exposures , Accenture , National Credit Federation , Alteryx, other than being multi-million dollar or sometimes multi-billion dollar company.
Well, all of them at a certain point have had major breaches of confidential data that were stored on Amazon private cloud. These breaches occurred mostly through user errors, but also sometimes through software configuration issues etc. The truth is Amazon being one of the largest companies in the world, and arguably one of the largest public cloud infrastructure providers along Google and Microsoft has a large target painted on its back. And having many datacenters increases the attack surface available for cyber miscreants and also increases the probability that somewhere the implemented security mechanism is just not strong enough.
Hence, a large provider may not always make the most business sense for your organization.
This topic begets the question – so if the large corporations providing technology infrastructure are so prone to issues, what is the right solution?
There is no silver bullet. A Microsoft hosted exchange might be right for a small startup where cost is a primary driver for business decisions. However, if business continuity and data security is a primary concern, the additional expense for hosting your own servers, with IT MSPs is a well justified business decision.
Additionally, having your infrastructure hosted by your own IT MSP guarantees you a better response time for incidents since with Microsoft as your provider, however big you are as a client, the revenues you generate for Microsoft is still just a drop in a very big pond.
Contact us today to do a cost-benefit and ROI analysis for your business.