Last week Microsoft Azure suffered a major outage, disrupting many enterprises worldwide that had shifted their workloads to Microsoft’s public cloud, including companies who have upgraded to Office 365.
The cloud skeptics are already gathering to tell these companies they shouldn’t have moved to the cloud, but ask the IT managers of LAN-based services whether they ever have had unplanned downtime and of course the answer is yes. So what’s the answer to the downtime conundrum?
The simple solution is to treat the cloud with the same level of respect that we’ve been treating our on-premises systems for decades…are your core services, like email, so important that your business cannot do without them?
If the answer to that question is ‘yes’, and it usually is, then you should go for a blended-cloud approach.
Despite the fact that these events will be frustrating and disruptive for Microsoft customers (or Google or any other service for that matter) it’s still no reason to stop plans to move to the cloud, or retreat to the shelter of the LAN. However, this incident should be a trigger for IT teams to check they are being careful about what core cloud service they choose and then how they protect it.
When you move critical services and data, like email, to the cloud, you must also plan for the inevitability that at some point the service will most likely go down – just as you would with business continuity solutions on your own infrastructure if you kept them in-house. With Mimecast services you keep employees’ email up and running, and keep them productive even in the event of an outage.
What happens when the cloud service goes down? Every IT leader should be able to answer that question immediately and show their continuity strategy. A strategy based on planning and technology, not hope.
For more information about Mimecast cloud email continuity services please click here.
The 2014 Atlantic Hurricane season is in full swing through November, putting your organization – and mission-critical systems, like email – at sudden risk of exposure to tropical storms, floods and fires.
Ask yourself: When was the last time you tested your business continuity plan? If the answer is one year or longer, you risk significant network downtime, data leakage and financial loss. According to Gartner, depending on your industry, network downtime can typically cost $5,600 per minute or more than $300,000 per hour, on average. Don't wait for disaster to strike. Treat email like the critical system it is, and avoid making these six mistakes that could jeopardize business continuity – and your job.
- Not testing your continuity solution. You've devised and implemented what you believe to be a solid continuity solution, but you've not given it a production test. Instead, you cross your fingers and hope when (and if) the time comes, the solution works as planned. There are two major problems with not testing your plan from the start. First, things get dusty over time. It's possible the technology no longer works, or worse, maybe it was not properly configured in the first place. Plus, you might not be regularly backing up critical systems. Without testing the solution, you'll learn the hard way that data is not being entirely backed up when you perform the restore. Second, when it comes to planning, you need a clear chain of command, should disaster strike. If your network goes down, you need to know who to call, immediately. Performing testing once simply is not enough. You need to test your solution once a year, at a minimum. Depending on the tolerance of your business, you'll likely have to test more frequently, like quarterly or even monthly.
- Forgetting to test fail back. Testing the failover capabilities of your continuity solution is only half the job. Are you prepared for downtime that could last hours, days or even weeks? The ability to go from the primary data center to the secondary one – then reverting back – is critical, and this needs to be tested. You need to know that data can be restored into normal systems after downtime.
- Assuming you can easily engage the continuity solution. It's common to plan for “normal” disasters like power outages and hardware failure. But in the event of something more severe, like a flood or fire, you need to know how difficult it's to trigger a failover. Also, you need to know where you need to be. For example, can you trigger the fail over from your office or data center? It's critical to know where the necessary tools are located and how long it'll take you or your team to locate them. Physical access is critical. Distribute tools to multiple data centers, as well as your local environment.
- Excluding policy enforcement. When an outage occurs, you must still account for regulatory and policy-based requirements that impact email communications. This includes archiving, continuity and security policies. Otherwise, you risk non-compliance.
- Trusting agreed RTP and RPO. In reality, you've got to balance risk and budget. When an outage happens, will the email downtime agreed upon by the business really stick? In other words, will the CEO really be able to tolerate no access to email for two hours? And will it be acceptable for customers to be out of touch with you for one day? The cost associated with RTO and RPO could cause a gap in data restore. If you budget for a two-day email restore, be prepared that during an outage, this realistically means two days without email for the entire organization. As part of your testing methodology, you may discover that you need more or less time to back up and restore data. It's possible that, as a result, you may need to implement more resilient technology – like moving from risky tape backup to more scalable and accessible cloud storage.
- Neglecting to include cloud services. Even when you implement cloud technologies to deliver key services, such as email, you still have the responsibility of planning for disruptions. Your cloud vendor will include disaster recover planning on their end to provide reliable services, but mishaps – and disasters – still happen. Mitigate this risk by stacking multi-vendor solutions wherever possible to ensure redundancy, especially for services like high availability gateways in front of cloud-based email services, or cloud backups of key data.
With the proper testing and upfront business continuity preparation, you can significantly reduce – or even prevent – email downtime, data leakage and financial loss after disaster strikes.
Any business that’s been through a migration at least once will remember that most of the migration effort was spent in planning. Otherwise they may remember the large mop-up operation and the time spent visiting desktops, recovering mail and rolling aspects of the migration backwards and forwards.
Exchange migrations tend to be complex. Even smaller organizations running Small Business Server with less than 75 users, may take a week or more to plan, prepare and execute their email migration.
Data loss (what PSTs?), client upgrades and wrongly migrated data tend to come to mind when thinking about what can go wrong, as well as the mail server that crashed during the migration. During a migration a fair amount of change is introduced and additional processing is forced onto both the source and target Exchange platform. For an older platform at the limits of its lifespan or operational capacity, the extra overhead an email migration introduces may be the straw that breaks the camel’s back.
Cloud based email continuity may act as insurance in this regard by enabling client continuity and transactional continuity in case the migration wobbles or breaks. Let’s explore that in a bit more detail.
Migrations are heavily process driven. In order to migrate, a fair amount of surveying, planning, lab testing, etc need to be accomplished. It makes sense to use the desktop visit of the plan/survey component to introduce the agents required onto the desktops in order to make client continuity possible.
If an Exchange server in the source or the target organization were to fail during the migration, Outlook clients would be redirected to the cloud, with little or no disruption to service or - crucially - the user experience. This allows the outage to be addressed, mail flow and client mail service to be restored without the pressure of fighting two fires concurrently – ie, a broken environment and a broken migration.
Cloud based email continuity allows you to benefit from the scale of the cloud as a side effect of leveraging continuity in the cloud, provided of course your users have the required network or internet connectivity to beat a path to the cloud.
In our day to day lives we're generally quite comfortable accepting the argument of personal insurance, which guards us against any number of possible scenarios, such as breaking a leg while skiing, medical insurance, insurance against theft, and so on. All of these boil down to paying a small amount of money to a much larger entity and thereby being guaranteed the benefit of that entity's scale and reach in the case of something unfortunate happening.
As the idea of cloud on demand becomes more pervasive, insuring your migration in the short term against loss of email continuity makes as much sense as taking out insurance on your car before you take it on the road.
In short Google claimed its Google Apps service had achieved 99.984% uptime in 2010 and, citing an independent report, went on to say this was 46 times more available than Microsoft's Exchange Server. Microsoft retaliated by saying BPOS achieved 99.9% (or better) uptime in 2010 and this was in line with their SLA. Microsoft quite rightly protested at Google's definitions of uptime and what should or should not be included.
The discussion continues.
Uptime is one of those things included in your service provider's SLA that you never really give much attention to, unless it's alarmingly low: 90%, for example. Most Cloud, SaaS or hosted providers will give uptime SLA figures of between 99.9% (three nines) and 99.999% (five nines). Mimecast proudly offers a 100% uptime SLA.
All of these nines represent different levels of 'guaranteed' service availability. For example, one nine (90%) allows for 36.5 days of downtime per year. As I said, alarming. Two nines (99%) would give you 3.65 days of downtime per year, three nines (99.9%) 8.76 hours, four nines (99.99%) 52.56 minutes and five nines (99.999%) 5.26 minutes per year. Lastly six nines, which is largely academic, gives a mere 31.5 seconds.
What does all of this mean to you as a consumer of these services? In terms of actual service, very little, unless you happen to be in the minority percentage; that is to say everything has gone dark and quiet and you're suffering a service outage. What is much more important is how the vendor treats you in the event they don't achieve 100%. It is hard for any vendor to absolutely guarantee 100% uptime all of the time, so you must make sure there is a provision for service credits or financial compensation in the event of an outage. If not, the SLA is worthless. Any reputable SaaS or Cloud vendor will have absolute confidence in their infrastructure, so based on historical performance a 100% availability SLA will be justifiable. Mimecast offers 100% precisely for this reason. We have spent a large amount of R&D time on getting the infrastructure right so it can be used to back up our SLA, and as a result we win many customers from vendors whose SLAs have flattered to deceive.
A larger issue perhaps we ought to consider is highlighted by the arrows Google is flinging in Microsoft's direction: namely, how do vendors really define uptime? What sort of event do they class as an outage? Does the event have to occur for any length of time to qualify? Is planned downtime included in the calculation? And so on.
There is no standard with which uptime is defined and common sense isn't always applied either. In other markets, consumers are reasonably protected from spurious vendor claims by independent third parties like Consumer Reports or Which. Not so with the claims tech companies make regarding the effectiveness of their solutions, and the result is a great deal of spin, which in turn inevitably leads to misinterpretation and confusion.
Fortunately, we're not the only ones to see the need for standards here. Although it's early days still, you can get an overview of ongoing current efforts at cloud-standards.org.
Google and Microsoft's argument is based largely on differences in measurement rather than any meaningful level of service. In a highly competitive market, any small differentiation can be a perceived bonus (by the vendor) but if we're all using different tape measures to mark our lines, the only reliable way tell who comes out on top is to talk to the long-term customers.