Velocity: How fast is your Exchange Server Store growing?

Adrian and Childs identified a shortcoming in the way business and technology leaders talk about big data, in that the emphasis is often placed on volume. They rightly pointed out that

This week Mimecast has been at the Gartner Data Center Conference 2011, in Las Vegas, with a packed agenda full of insightful discussions and presentations. As expected the Cloud was a strong trend throughout the week, but I couldn't help but notice that another trend has emerged since the last summit; that of Big Data, a topic this blog has written about many times before.

Blurred vehicle lights and cityscape

One particularly compelling presentation by Gartner Research VPs, Merv Adrian and Sheila Childs delved into Big Data. The packed session was standing room only, so this is obviously a hot topic for people looking for insight to help them solve their own unique problems.

"The most difficult information management issues emerge from the simultaneous and persistent interaction of extreme volume, variety of data formats, velocity of record creation and variable latencies, and the complexity of individual data types within formats."

As we're concentrating on volume of data, we're often forgetting about the velocity, variety and complexity of the data too. Adrian and Childs went on to quantify velocity, which is when I started relating it to email data and Exchange Stores.

Velocity involves streams of data, structured record creation and availability for access and delivery. Velocity means both how fast data is being produced, and how fast the data must be processed to meet demand.

The most important factor when it comes to thinking about Big Data in relation to Microsoft Exchange Server, in my opinion, is velocity. Of course most Exchange databases won't have the sort of big data that most data center managers have to worry about, but to those of us who manage Exchange Servers, I'll bet the data therein is one of the largest repositories of data in your environment. To coin a phrase of our Chief Scientist, you have essentially got a Nano-Google's worth of data, it's important to you, but nothing that hasn't been dealt with before, but trying telling that to the Exchange administrator when they're planning to migrate the stores from one version of Exchange to another.

So what is the Velocity of your Exchange Server? If Velocity is the stream of data, record creation and availability for access and delivery, I'm sure there must be a quadratic equation that will actually give us a figure for this. But I was thinking more about it in terms of every day reality, especially if that reality means an upgrade or migration.

The unique big data complexity that exists within each Exchange environment is compounded by the velocity of the email environment that surrounds it. The data will continue to grow at a rate that can only be determined by a number of local factors; corporate culture, use of email, access to email, integration of email into other systems. Again, I'm sure there is a quantitative way to work out what this velocity is.

When you're thinking of doing something with your nano-Google Exchange store I would suggest that getting a grip on the velocity of Exchange is the first step. I doubt very much that you can do anything to throttle this velocity, not without upsetting your users at least. So I'm drawn to the phrase "Just Enough on Site" which is one we use at Mimecast, to describe an Exchange environment that has been given the benefit of Cloud Augmentation to take the Big Data load off said server, before, during and after a tricky migration.

I would argue that the amount of 'online' data needed in an Exchange Server is pretty minimal, probably about a month or two. The rest doesn't need to be offline, but keeping it near-line is way more productive. Remember velocity is also about how fast the data must be processed to meet demand. Surely putting the less accessed and older data near-line in the cloud means your Exchange can concentrate on the on-line velocity of the real time data?        

FILED IN