Continuity: My Blackberry and other mobile devices

RIM & their Blackberry handsets (other mobile devices are available) are having a hard time of it in some Middle-Eastern countries at the moment. As this blog and other outlets have reported, the encryption used to protect data in transit isn't agreeable to the Governments of those States.

Whilst this is alarming news for those users that might be affected by the controls, the rest of us won't have to worry so much about the sudden loss of functionality on our handhelds. The regulation wrapped around encryption is more lenient here in the US and most of Europe. Why then do I still have a twinge of worry in the back of my mind?

What this saga does highlight, is our reliance on a solution or service that can act as a single point of failure in its own right. Access to mobile email is ubiquitous these days, regardless of your choice of mobile device - the rise of smart-phones has allowed everyone to access everything from anywhere. And when “anywhere and everything,” don’t line up with access, boy do we know about it.

As someone who has been on the end of irate phone calls from a C-level manager, about lack of access to mobile email I understand that balancing availability is a bit like stacking slices of Swiss Cheese. Line up all the holes and all is right with the world; but move a single slice out of place and access stops. Of course those slices can represent many things, from device to service provider to my own email infrastructure.

Mobile email is quick and easy to deploy, and unbelievable useful, but how many of us consider the wider impact of that service grinding to a halt? I don’t want to pick on RIM or Blackberry, but they have had some well publicized network outages; other devices suffer too, perhaps not in the same way because they are more reliant on our local network being up and running.

Many of the BES and Blackberry solutions I see were first installed sometime ago, probably when the  early 5000 or 6000 series devices were introduced. Remember those, with the integrated phone you could only use with a headset?  Back then we only installed a single BES server, today we’re looking at clustering them for full resilience. But even then we’re still limiting the extent to which we can continue to provide service.

I know complete site outages are rare, but loss of service to a device isn’t so. Hand on heart, is a simple on-site cluster the best we can do? Have we sat down and examined every small part of that service to make sure we can cope in the event of an outage? If I sent you my Swiss Cheese model, would all of your slices line up?