Operational Maturity part two - Belts and braces

Following on from the previous post from our guest blogger Nic Blank on operational maturity and storage, I'd like to call out another area in operational maturity – Transporting and Routing Mail. Exchange 2010's transport High Availability model is rather simple – add more HUB servers, and they become redundant, shadow transport ensures mail delivery and so on. However, this requires some planning. Shadow transport is a great feature; it really is. It allows the failure of a Hub Transport server without the loss of messages in transit. However; two caveats come to mind:
  1. You need more than one HUB Transport server in a given Active Directory site in order for Hub Transport servers to load balance mail traffic
  2. You need an Exchange Server on either side of the Hub Transport server – either an edge or another Hub transport server on either side OR a mail server as the originator or destination.
So where's the issue? Nic defined Operational Maturity as the absence or presence of the technology and processes required in order to absorb and mitigate a failure in an acceptable time frame (normally the SLA). He also made the point that Exchange is really good at absorbing failure if it's built to do so. Hub Transport server supporting Shadow Transport is one of those features. In simple terms, if a failure of a Hub Transport server is detected, the messages which that Hub Transport server was responsible for are going to be re-assigned by the previous hop or message originator to another hub transport server in the same site. If operational maturity is low, i.e. if an organization didn't have sufficient reporting and remedial measures in place to determine that failures had occurred, under extreme circumstances, the Hub Transport server would keep on failing until mail flow fails altogether. We can mitigate this in part by introducing message routing intelligence on the outside of the organization. However we need to make sure that are not exacerbating the problem by simply relieving a pain point and not addressing the cause. Moving on, where do we start with operation maturity? We don't have to start with a full blown SCOM implementation to ensure mail is flowing between two points. If you have nothing at all start with:
  • Simple mail testing tools that send mail between two mailboxes at nominated points in the org.
  • Free tools and or Scripts capable of pinging servers at regular intervals, checking for service availability, etc
  • More tools or scripts to monitor for disk availability and disk free space usage
Often starting somewhere with basic belts and braces is better than not doing anything, but it is quite critical to combine simple tools and simple processes into simple standards and thereby raise the operational maturity from zero. Anything greater than zero is a short term win and will leave your organization with something to work on and improve.